refactor(domain): freeze Movie and Episode, switch track collections to tuple

Movie and Episode become @dataclass(frozen=True, eq=False), with audio_tracks/subtitle_tracks held as tuple[...] instead of list[...]. Identity-based equality is preserved via the existing __eq__/__hash__. __post_init__ coercion (imdb_id, title, season_number, episode_number) uses object.__setattr__ to stay compatible with frozen. The MediaWithTracks mixin contract is updated to tuple accordingly. Callers projecting enrichment results (probe output, file metadata) now rebuild via dataclasses.replace(...) — same pattern recently adopted for ParsedRelease. Season and TVShow stay mutable for now: freezing the aggregate root would cascade a full reconstruction on every add_episode, deferred.
feat(release): add fullwidth vertical bar ｜ (U+FF5C) to separators
2026-05-21 13:40:22 +02:00 · 2026-05-21 08:05:56 +02:00 · 2026-05-21 08:05:46 +02:00 · 2026-05-21 07:54:17 +02:00 · 2026-05-21 07:51:49 +02:00 · 2026-05-21 07:46:13 +02:00
98 changed files with 5069 additions and 1628 deletions
@@ -15,8 +15,372 @@ callers).

 ## [Unreleased]

+### Fixed
+
+- **Multi-episode chain (e.g. `S14E09E10E11`) now collapses to a full
+  range.** The parser previously captured `episode=9, episode_end=10`
+  and dropped E11+. It now returns `episode=first, episode_end=last`,
+  with intermediate values implied. Fixture
+  `shitty/archer_multi_episode/` updated from anti-regression-of-bug
+  to anti-regression-of-fix.
+- **Apostrophes in titles no longer push the release through the AI
+  fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen`
+  previously parsed with `parse_path="ai"` and everything UNKNOWN
+  because `'` is in the forbidden-chars list. Apostrophes are now
+  pre-stripped before the well-formed check, so the parse completes
+  normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only
+  the title text loses its apostrophe. `parse_path` becomes
+  `sanitized` to surface the cleanup. Side win: PoP fixture
+  `the_prodigy_full_chaos/` also moves from total failure to a
+  partially-correct parse (year, source, codec extracted).
+- **Season-range markers (`Sxx-yy`) are now recognized as
+  `tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously
+  parsed as `media_type=movie` with `S01-06` glued onto the title.
+  The parser now recognizes the range, sets `season=first`,
+  `media_type=tv_complete`, and removes the marker from the title.
+  `is_season_pack` flips to `true`.
+- **Pure-punctuation TITLE tokens are dropped at assembly.** Releases
+  with surrounding ` - ` separators (`Vinyl - 1x01 - FHD`) previously
+  produced `title="Vinyl.-"`. Such tokens (a stray dash, a wide pipe
+  `｜`, …) carry no title content and are now filtered out. Side
+  effect: PoP fixture `khruangbin_yt_wide_pipe/` also benefits — the
+  YouTube wide-pipe no longer leaks into the title.
+
 ### Added

+- **Fullwidth vertical bar `｜` (U+FF5C) is now a recognized release-name
+  token separator.** Added to `alfred/knowledge/release/separators.yaml`
+  so CJK release names (and the occasional decorative YouTube-style use)
+  tokenize cleanly instead of leaving the wide pipe glued onto an
+  adjacent token. The tokenizer in
+  `alfred/domain/release/parser/pipeline.py` already iterates the
+  separator list as plain strings (no regex), so a multi-byte UTF-8
+  separator works without any code change.
+
+- **`InspectedResult.recommended_action` property** — derived hint that
+  collapses the orchestrator's go / wait / skip decision into a single
+  value (``"process"`` / ``"ask_user"`` / ``"skip"``). Centralizes the
+  exclusion logic that was previously dispersed across road /
+  media_type / main_video checks at each call site. Ordering is part of
+  the contract: ``skip`` (no main video, or media_type == ``"other"``)
+  wins over ``ask_user`` (media_type == ``"unknown"`` or road ==
+  ``"path_of_pain"``) which wins over ``process``. Surfaced through the
+  ``analyze_release`` tool so the LLM can route on it directly.
+  6 new tests in ``tests/application/test_inspect.py`` cover the four
+  branches and the precedence rules.
+- **`LanguageRepository` port** in `alfred.domain.shared.ports`. Structural
+  Protocol covering `from_iso`, `from_any`, `all`, `__contains__`, `__len__`
+  — the surface previously coupled to the concrete `LanguageRegistry`.
+  Mirrors the `MediaProber` / `FilesystemScanner` pattern: domain code
+  depends on the Protocol, infrastructure provides the YAML-backed
+  adapter. Tests in `tests/infrastructure/test_language_registry.py`.
+
+### Changed
+
+- **`Movie` and `Episode` are now frozen dataclasses.** Both entities
+  hold their track collections as `tuple[AudioTrack, ...]` and
+  `tuple[SubtitleTrack, ...]` instead of mutable lists, and are
+  `@dataclass(frozen=True, eq=False)` (identity-based equality
+  preserved via `__eq__`/`__hash__`). `__post_init__` coercion uses
+  `object.__setattr__` for the `imdb_id` / `title` /
+  `season_number` / `episode_number` normalizations. To project
+  enrichment results (probe output, file metadata) callers now rebuild
+  via `dataclasses.replace(...)`. Pattern aligned with the recent
+  `ParsedRelease` freeze. `MediaWithTracks` mixin contract updated to
+  `tuple` accordingly. `Season` and `TVShow` remain mutable for now —
+  freezing the aggregate root would cascade a full reconstruction on
+  every `add_episode`, deferred.
+- **`SubtitleCandidate` renamed to `SubtitleScanResult`.** The old name
+  conflated "this might become a placed subtitle" with "this is what a
+  scan pass produced". The class is the output of a scan/identify pass
+  — language/format may still be `None`, confidence reflects how sure
+  the classifier is, and `raw_tokens` holds the filename fragments
+  under analysis. `SubtitleScanResult` says that directly. Pure rename
+  with a refreshed docstring in `alfred/domain/subtitles/entities.py`;
+  no behavior change. Touches the domain entity + `__init__` export,
+  the matcher / identifier / utils services, the manage_subtitles use
+  case, the placer, the metadata store, the shared-media cross-ref
+  comment, and the seven test modules that imported the type.
+
+- **`ParsedRelease` is now frozen; enrichment passes return new
+  instances.** The VO was mutable so `detect_media_type` and
+  `enrich_from_probe` could patch fields in place — a code smell in a
+  value object whose identity *is* its content. `ParsedRelease` is now
+  `@dataclass(frozen=True)`; `languages` is a `tuple[str, ...]`
+  instead of a `list[str]`. `enrich_from_probe` returns a new
+  `ParsedRelease` via `dataclasses.replace` (only allocates when at
+  least one field actually changed). `inspect_release` rebinds
+  `parsed` after both `detect_media_type` (wrapped in `MediaTypeToken`
+  to satisfy the strict isinstance check that now also runs on
+  replace) and `enrich_from_probe`. Parser pipeline now packs
+  `languages` as a tuple in the assemble dict. Callers updated:
+  `inspect_release`, `testing/recognize_folders_in_downloads.py`, and
+  the enrichment tests (22 call sites + language assertions switched
+  to tuple literals).
+- **`resolve_destination` use cases take `kb` / `prober` as required
+  params; module-level singletons gone.** The four
+  `resolve_{season,episode,movie,series}_destination` use cases now
+  accept `kb: ReleaseKnowledge` and `prober: MediaProber` as required
+  arguments, matching the shape of `inspect_release`. The module-level
+  `_KB = YamlReleaseKnowledge()` and `_PROBER = FfprobeMediaProber()`
+  singletons that previously lived in
+  `alfred/application/filesystem/resolve_destination.py` are removed —
+  the application layer no longer reaches into infrastructure. The
+  singletons now live at the agent-tools frontier
+  (`alfred/agent/tools/filesystem.py`), where the LLM-facing wrappers
+  instantiate them once and thread them through. `analyze_release` no
+  longer needs the dirty `from ... import _KB` indirection. Tests
+  inject their own stubs by keyword (`prober=_StubProber(...)`) instead
+  of monkeypatching a module attribute.
+- **`ParsePath` enum renamed to `TokenizationRoute`.** The old name
+  collided with `pathlib.Path` in code-reading mental models, and was
+  one letter from `parse_path` (the field that holds the value) — making
+  it harder than it needed to be to spot the type vs the attribute.
+  ``TokenizationRoute`` says what it actually captures (DIRECT /
+  SANITIZED / AI = how the name reached the tokenizer), and the class
+  docstring now spells out the orthogonality with ``Road`` (EASY /
+  SHITTY / PATH_OF_PAIN, which captures parser confidence on
+  ``ParseReport``). The ``parse_path`` field name stays unchanged —
+  string values too — so YAML fixtures, the ``analyze_release`` tool
+  spec, and any external consumer are untouched.
+- **`enrich_from_probe` codec mappings moved to YAML.** The three
+  hard-coded module dicts (`_VIDEO_CODEC_MAP`, `_AUDIO_CODEC_MAP`,
+  `_CHANNEL_MAP`) translating ffprobe output to scene tokens
+  (`hevc → x265`, `eac3 → EAC3`, `8 → "7.1"`, …) now live in
+  `alfred/knowledge/release/probe_mappings.yaml` and are loaded into
+  `ReleaseKnowledge.probe_mappings` (new port field, populated by
+  `YamlReleaseKnowledge`). `enrich_from_probe` gains a third `kb`
+  parameter and reads the maps from there. Aligns with the CLAUDE.md
+  rule that lookup tables of domain knowledge belong in YAML, not in
+  Python — and opens the door to a future "learn new codec" pass.
+  Callers updated: `inspect_release`, `testing/recognize_folders_in_downloads.py`,
+  and all 22 sites in `tests/application/test_enrich_from_probe.py`.
+- **`ParsedRelease.tech_string` is now a derived `@property`**
+  (`alfred/domain/release/value_objects.py`). It computes
+  `quality.source.codec` joined by dots on every access, so it stays in
+  sync with the underlying fields by construction. The stored field is
+  gone from the dataclass, the dict returned by `assemble()` no longer
+  carries the key, `parse_release`'s malformed-name fallback drops the
+  `tech_string=""` kwarg, and `enrich_from_probe` no longer re-derives
+  it after filling `quality`/`source`/`codec`. Closes the
+  parser/enrichment double-source-of-truth that `e79ca46` had to fix
+  reactively. The fixtures runner now injects `tech_string` alongside
+  `is_season_pack` since `asdict()` skips properties.
+- **`RuleScope.level` is now an enum (`RuleScopeLevel`).** The set of
+  valid levels (global, release_group, movie, show, season, episode)
+  was documented only in a docstring comment and validated nowhere.
+  `RuleScopeLevel(str, Enum)` keeps wire compatibility (YAML
+  serialization, `.value` access) while making the closed set explicit
+  to type-checkers and IDEs. `to_dict()` emits `.value` strings so
+  YAML output is unchanged.
+- **`FilePath` VO uses `__post_init__` instead of a hand-rolled
+  `__init__`.** Same public API (accepts `str | Path`), same behavior,
+  but the dataclass-generated `__init__` is no longer bypassed. One
+  less smell in the shared VOs.
+- **`Language` VO is strict by default; `Language.from_raw()` factory
+  for normalization.** The previous `__post_init__` mutated `iso` and
+  `aliases` via `object.__setattr__` on a frozen dataclass — a code
+  smell hiding behind the dataclass facade. Split: the direct
+  constructor now rejects un-normalized input (uppercase iso,
+  whitespace in aliases, etc.), and `Language.from_raw()` handles
+  arbitrary YAML/user input. Only one caller (LanguageRegistry loading
+  the ISO YAML) needed migration.
+- **`ParsedRelease.normalised` renamed to `clean`.** The field name
+  promised "dots instead of spaces" but in practice held
+  `raw - site_tag - apostrophes` — only used by `season_folder_name()`.
+  Renamed and docstring corrected.
+- **`ParsedRelease.media_type` / `parse_path` are strict enums.** The
+  fields were already typed as `MediaTypeToken` / `ParsePath`, but a
+  tolerant `__post_init__` coerced raw strings. With both classes
+  being `(str, Enum)`, the coercion served no purpose. Strict
+  constructor; `.value` no longer passed at call sites; dropped the
+  unused `_VALID_MEDIA_TYPES` / `_VALID_PARSE_PATHS` lookup tables.
+
+### Removed
+
+- **`settings.min_movie_size_bytes`** — orphan Pydantic field +
+  validator. Its only consumer (`MovieService.validate_movie_file`)
+  had been removed during an earlier refactor. The "real movie vs
+  sample" rule now lives in extension-based exclusion
+  (`application/release/supported_media.py`) and PoP. If a size
+  threshold is ever needed, it'll go in a knowledge YAML, not in
+  `settings`.
+
+### Internal
+
+- **Flattened `alfred.domain.shared.media/` package into a single
+  `media.py` module.** The 6-file package (audio, video, subtitle,
+  info, matching, tracks_mixin + `__init__`) collapsed into one ~250
+  LoC module. All 12 import sites continue to resolve unchanged
+  (`from alfred.domain.shared.media import AudioTrack, MediaInfo, …`)
+  since Python treats `media.py` and `media/__init__.py`
+  interchangeably for import paths. Easier to scan when the whole
+  bounded-context fits on one screen.
+- **`SubtitleKnowledgeBase` types `language_registry` against the
+  `LanguageRepository` port** instead of the concrete `LanguageRegistry`
+  class. The default constructor still instantiates the concrete adapter
+  when no repository is injected — behaviour is unchanged for existing
+  callers. Opens the door to in-memory fakes in future tests without
+  loading the full ISO 639 YAML.
+- **Moved `detect_media_type` and `enrich_from_probe` from
+  `alfred.application.filesystem` to `alfred.application.release`**.
+  They are inspection-pipeline helpers — their natural home is next to
+  `inspect_release`, not next to the filesystem use cases. The move
+  also eliminates a circular-import workaround in
+  `resolve_destination.py`: `inspect_release` can now be imported at
+  module top instead of lazily inside `_resolve_parsed`. Public
+  surface is unchanged for callers that imported the helpers from
+  their full module paths (the only call sites — `inspect.py`, two
+  tests, one testing script — were updated in this commit).
+
+### Added
+
+- **`resolve_*_destination` use cases now consume `inspect_release`**.
+  `resolve_episode_destination` and `resolve_movie_destination` reuse
+  their existing `source_file` parameter as the inspection target;
+  `resolve_season_destination` and `resolve_series_destination` gain
+  a new **optional** `source_path` parameter (also threaded through
+  the tool wrappers and YAML specs). When the path exists, ffprobe
+  data fills tokens missing from the release name (e.g. quality) and
+  refreshes `tech_string`, so the destination folder / file names
+  end up more accurate. When the path is missing or absent (back-compat
+  callers), the use cases fall back to parse-only — same behavior as
+  before.
+
+### Fixed
+
+- **`enrich_from_probe` now refreshes `tech_string`** after filling
+  `quality` / `source` / `codec`. Previously the field stayed at its
+  parser-time value, so filename builders saw stale tech tokens even
+  after a successful probe. New `TestTechString` class in
+  `tests/application/test_enrich_from_probe.py` locks the behavior.
+
+### Added
+
+- **`inspect_release` orchestrator + `InspectedResult` VO**
+  (`alfred/application/release/inspect.py`). Single composition of the
+  four inspection layers: `parse_release` → `detect_media_type` (patches
+  `parsed.media_type`) → `find_main_video` (top-level scan) →
+  `prober.probe` + `enrich_from_probe` when a video exists and the
+  refined media type isn't in `{"unknown", "other"}`. Returns a frozen
+  `InspectedResult(parsed, report, source_path, main_video, media_info,
+  probe_used)` that downstream callers consume directly instead of
+  rebuilding the same chain. `kb` and `prober` are injected — no
+  module-level singletons. Never raises.
+
+### Changed
+
+- **`analyze_release` tool now delegates to `inspect_release`** — same
+  output shape, plus two new fields: `confidence` (0–100) and `road`
+  (`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's
+  `ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents
+  both fields so the LLM can route releases by confidence.
+
+- **`MediaProber` port now covers full media probing**: added
+  `probe(video) -> MediaInfo | None` alongside the existing
+  `list_subtitle_streams`. `FfprobeMediaProber` (in
+  `alfred/infrastructure/probe/`) implements both methods and is now
+  the single adapter shelling out to `ffprobe`. The standalone
+  `alfred/infrastructure/filesystem/ffprobe.py` module was removed —
+  all callers (tools, testing scripts) instantiate
+  `FfprobeMediaProber` instead. Unblocks the upcoming
+  `inspect_release` orchestrator, which depends on the port.
+
+### Removed
+
+- `alfred/infrastructure/filesystem/ffprobe.py` (folded into the
+  `FfprobeMediaProber` adapter).
+
+---
+
+## [2026-05-20] — Release parser confidence scoring + exclusion
+
+### Added
+
+- **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`):
+  `is_supported_video(path, kb)` (extension-only check against
+  `kb.video_extensions`) and `find_main_video(folder, kb)` (top-level
+  scan, lexicographically-first eligible file, returns `None` when no
+  video qualifies; accepts a bare file as folder for single-file
+  releases). No size threshold, no filename heuristics —
+  PATH_OF_PAIN handles the exotic cases. Foundation for the future
+  `inspect_release` orchestrator.
+
+- **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`,
+  `alfred/knowledge/release/scoring.yaml`). `parse_release` now returns
+  `(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO
+  carries a 0–100 `confidence`, a `road` (`"easy"` / `"shitty"` /
+  `"path_of_pain"`), the residual UNKNOWN tokens, and the missing
+  critical fields. EASY is decided structurally (a group schema
+  matched); SHITTY vs PATH_OF_PAIN is decided by score against a
+  YAML-configurable cutoff (default 60). Weights and penalties also
+  live in `scoring.yaml` — title 30, media_type 20, year 15, season
+  10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at
+  -30. `Road` is a new enum, distinct from `ParsePath` (which records
+  the tokenization route, not the confidence tier). `ReleaseKnowledge`
+  port gains a `scoring: dict` field.
+
+### Changed
+
+- **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease,
+  ParseReport]` instead of returning a bare `ParsedRelease`. Call
+  sites updated in `application/filesystem/resolve_destination.py` and
+  `agent/tools/filesystem.py`. Tests updated accordingly.
+
+---
+
+## [2026-05-20] — Release parser v2 (EASY + SHITTY)
+
+### Added
+
+- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
+  new annotate-based pipeline (tokenize → annotate → assemble) drives
+  releases from known groups. Exposes `Token` (frozen VO with `index` +
+  `role` + `extra`), `TokenRole` enum (structural/technical/meta families),
+  and `GroupSchema` / `SchemaChunk` value objects.
+  - `pipeline.tokenize`: string-ops separator split (no regex), strips
+    a `[site.tag]` prefix/suffix first.
+  - `pipeline.annotate`: detects the trailing group right-to-left
+    (priority to `codec-GROUP` shape, fallback to any non-source dashed
+    token), looks up its `GroupSchema`, then walks tokens and schema
+    chunks in lockstep — optional chunks that don't match are skipped,
+    mandatory mismatches abort EASY and return `None` so the caller can
+    fall back to SHITTY.
+  - `pipeline.assemble`: folds annotated tokens into a
+    `ParsedRelease`-compatible dict.
+  - `parse_release` (in `release.services`) tries the v2 EASY path first
+    and falls through to the legacy SHITTY heuristic on `None`. Legacy
+    SHITTY/PATH OF PAIN behavior is unchanged.
+  - Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
+    rarbg}.yaml` declare the canonical chunk order per group, loaded via
+    new `ReleaseKnowledge.group_schema(name)` port method.
+  - Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
+    cover token VOs, site-tag stripping, group detection, schema-driven
+    annotation (movie, TV episode, season pack with optional source),
+    and field assembly.
+
+- **Release parser v2 — enricher pass** completes the EASY pipeline.
+  The structural schema walk now tolerates non-positional tokens
+  between chunks (instead of aborting on leftover tokens), and a second
+  pass tags them with audio / video-meta / edition / language roles.
+  Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
+  (e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
+  matched before single tokens. Channel layouts like `5.1` and `7.1`
+  (split into two tokens by the `.` separator) are detected as
+  consecutive pairs. Sequence members carry an `extra["sequence_member"]`
+  marker so `assemble` extracts the canonical value only from the
+  primary token. KONTRAST releases with audio / HDR / edition / language
+  metadata now produce a fully populated `ParsedRelease`.
+
+- **Streaming distributor as a separate dimension** from encoding source.
+  New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX,
+  ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors`
+  port field, a `TokenRole.DISTRIBUTOR` annotation, and a
+  `ParsedRelease.distributor` field. `WEB-DL` stays the source; the
+  platform that produced the release is now recorded distinctly. The
+  five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed
+  from `sources.yaml`.
+
 - **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
  each documenting an expected `ParsedRelease` plus the future `routing`
  (library / torrents / seed_hardlinks) for the upcoming `organize_media`
@@ -54,6 +418,22 @@ callers).

 ### Changed

+- **Release parser v2 — SHITTY simplified to dict-driven tagging**.
+  The legacy ~480-line heuristic block in `release/services.py` is gone;
+  `pipeline._annotate_shitty` does a single pass that looks each token
+  up in the kb buckets (resolutions / sources / codecs / distributors /
+  year / `SxxExx`) with first-match-wins semantics, and the leftmost
+  contiguous UNKNOWN run becomes the title. `annotate()` no longer
+  returns `None` — SHITTY is the always-on fallback when no group schema
+  matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures
+  (`deutschland_franchise_box`, `sleaford_yt_slug`,
+  `super_mario_bilingual`, `predator_space_separators` — the last one
+  moved from `shitty/` → `path_of_pain/`) are now marked
+  `pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies
+  that SHITTY intentionally won't handle. `ReleaseFixture` grows an
+  `xfail_reason` field; the parametrized suite wires the xfail mark
+  automatically.
+
 - **`parse_release` tokenizer is now data-driven**: it splits on any character
  listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
  This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
@@ -184,6 +564,47 @@ callers).
  globally — noisy on parser mappers and orchestrator use-cases where early-return
  validation is essential complexity. Ignore `PLW0603` for the documented memory
  singleton (`infrastructure/persistence/context.py`).
+- **Release-knowledge DDD purification** (`refactor/domain-release-knowledge`):
+  the last domain → infrastructure leak (`domain/release/value_objects.py`
+  loading YAML at import-time) is gone. Achieved via:
+  - **`ReleaseKnowledge` Protocol port** at
+    `alfred/domain/release/ports/knowledge.py` declares the read-only query
+    surface release parsing needs (token sets for resolutions, sources, codecs,
+    languages, hdr extras; structured dicts for audio, video_meta, editions,
+    media_type_tokens; separators list; file-extension sets used by
+    application/infra callers; `sanitize_for_fs(text)` method).
+  - **`YamlReleaseKnowledge` adapter** at
+    `alfred/infrastructure/knowledge/release_kb.py` loads every YAML constant
+    once at construction. Builds an immutable `str.maketrans` translation
+    table for filesystem sanitization.
+  - **`parse_release(name, kb)`** takes the knowledge as an explicit
+    parameter — no more module-level YAML loading inside the domain. Every
+    internal helper (`_tokenize`, `_extract_tech`, `_extract_languages`,
+    `_extract_audio`, `_extract_video_meta`, `_extract_edition`,
+    `_extract_title`, `_infer_media_type`, `_is_well_formed`) takes `kb`.
+  - **`ParsedRelease` Option B**: sanitization happens once at parse time
+    and is stored on a new `title_sanitized: str` field. Builder methods
+    (`show_folder_name`, `season_folder_name`, `episode_filename`,
+    `movie_folder_name`, `movie_filename`) are now pure — they accept
+    already-sanitized `tmdb_title_safe` / `tmdb_episode_title_safe`
+    arguments. Callers at the use-case boundary sanitize TMDB strings
+    via `kb.sanitize_for_fs(...)` before passing them in.
+  - **All domain-knowledge constants removed from `value_objects.py`**:
+    `_RESOLUTIONS`, `_SOURCES`, `_CODECS`, `_AUDIO`, `_VIDEO_META`,
+    `_EDITIONS`, `_HDR_EXTRA`, `_MEDIA_TYPE_TOKENS`, `_LANGUAGE_TOKENS`,
+    `_FORBIDDEN_CHARS`, `_VIDEO_EXTENSIONS`, `_NON_VIDEO_EXTENSIONS`,
+    `_SUBTITLE_EXTENSIONS`, `_METADATA_EXTENSIONS`, `_WIN_FORBIDDEN_TABLE`,
+    and the `_sanitize_for_fs` helper. The domain module is now pure.
+  - **Application-layer KB singleton**: `resolve_destination.py` instantiates
+    a module-level `_KB: ReleaseKnowledge = YamlReleaseKnowledge()` and
+    threads it through every `parse_release(...)` call. The local
+    `_sanitize` helper and `_WIN_FORBIDDEN` regex were dropped in favor of
+    `_KB.sanitize_for_fs(...)`.
+  - **`detect_media_type(parsed, source_path, kb)` and
+    `find_video_file(path, kb)`** now take the knowledge explicitly
+    instead of importing `_*_EXTENSIONS` constants from the domain.
+    `agent/tools/filesystem.py::analyze_release` imports the application
+    KB singleton and passes it through.

 ---

@@ -13,8 +13,6 @@ from alfred.application.filesystem import (
    MoveMediaUseCase,
    SetFolderPathUseCase,
 )
-from alfred.application.filesystem.detect_media_type import detect_media_type
-from alfred.application.filesystem.enrich_from_probe import enrich_from_probe
 from alfred.application.filesystem.resolve_destination import (
    resolve_episode_destination as _resolve_episode_destination,
 )
@@ -28,10 +26,16 @@ from alfred.application.filesystem.resolve_destination import (
    resolve_series_destination as _resolve_series_destination,
 )
 from alfred.infrastructure.filesystem import FileManager, create_folder, move
-from alfred.infrastructure.filesystem.ffprobe import probe
-from alfred.infrastructure.filesystem.find_video import find_video_file
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
 from alfred.infrastructure.metadata import MetadataStore
 from alfred.infrastructure.persistence import get_memory
+from alfred.infrastructure.probe import FfprobeMediaProber
+
+# Agent-tools frontier: this is the legitimate home for the singletons that
+# back every LLM-exposed wrapper. The use cases below take ``kb`` / ``prober``
+# as required params; tests inject their own stubs.
+_KB = YamlReleaseKnowledge()
+_PROBER = FfprobeMediaProber()

 _LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge"

@@ -57,10 +61,17 @@ def resolve_season_destination(
    tmdb_title: str,
    tmdb_year: int,
    confirmed_folder: str | None = None,
+    source_path: str | None = None,
 ) -> dict[str, Any]:
    """Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_season_destination.yaml."""
    return _resolve_season_destination(
-        release_name, tmdb_title, tmdb_year, confirmed_folder
+        release_name,
+        tmdb_title,
+        tmdb_year,
+        _KB,
+        _PROBER,
+        confirmed_folder,
+        source_path,
    ).to_dict()


@@ -78,6 +89,8 @@ def resolve_episode_destination(
        source_file,
        tmdb_title,
        tmdb_year,
+        _KB,
+        _PROBER,
        tmdb_episode_title,
        confirmed_folder,
    ).to_dict()
@@ -91,7 +104,7 @@ def resolve_movie_destination(
 ) -> dict[str, Any]:
    """Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_movie_destination.yaml."""
    return _resolve_movie_destination(
-        release_name, source_file, tmdb_title, tmdb_year
+        release_name, source_file, tmdb_title, tmdb_year, _KB, _PROBER
    ).to_dict()


@@ -100,10 +113,17 @@ def resolve_series_destination(
    tmdb_title: str,
    tmdb_year: int,
    confirmed_folder: str | None = None,
+    source_path: str | None = None,
 ) -> dict[str, Any]:
    """Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_series_destination.yaml."""
    return _resolve_series_destination(
-        release_name, tmdb_title, tmdb_year, confirmed_folder
+        release_name,
+        tmdb_title,
+        tmdb_year,
+        _KB,
+        _PROBER,
+        confirmed_folder,
+        source_path,
    ).to_dict()


@@ -190,21 +210,10 @@ def set_path_for_folder(folder_name: str, path_value: str) -> dict[str, Any]:

 def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
    """Thin tool wrapper — semantics live in alfred/agent/tools/specs/analyze_release.yaml."""
-    from alfred.domain.release.services import parse_release  # noqa: PLC0415
-
-    path = Path(source_path)
-    parsed = parse_release(release_name)
-    parsed.media_type = detect_media_type(parsed, path)
-
-    probe_used = False
-    if parsed.media_type not in ("unknown", "other"):
-        video_file = find_video_file(path)
-        if video_file:
-            media_info = probe(video_file)
-            if media_info:
-                enrich_from_probe(parsed, media_info)
-                probe_used = True
+    from alfred.application.release import inspect_release  # noqa: PLC0415

+    result = inspect_release(release_name, Path(source_path), _KB, _PROBER)
+    parsed = result.parsed
    return {
        "status": "ok",
        "media_type": parsed.media_type,
@@ -226,7 +235,10 @@ def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
        "edition": parsed.edition,
        "site_tag": parsed.site_tag,
        "is_season_pack": parsed.is_season_pack,
-        "probe_used": probe_used,
+        "probe_used": result.probe_used,
+        "confidence": result.report.confidence,
+        "road": result.report.road,
+        "recommended_action": result.recommended_action,
    }


@@ -240,7 +252,7 @@ def probe_media(source_path: str) -> dict[str, Any]:
            "message": f"{source_path} does not exist",
        }

-    media_info = probe(path)
+    media_info = _PROBER.probe(path)
    if media_info is None:
        return {
            "status": "error",
@@ -80,3 +80,6 @@ returns:
      site_tag: Source-site tag if present.
      is_season_pack: True when the folder contains a full season.
      probe_used: True when ffprobe successfully enriched the result.
+      confidence: Parser confidence score, 0–100 (higher = more reliable).
+      road: "Parser road: 'easy' (group schema matched), 'shitty' (heuristic but acceptable), or 'path_of_pain' (low confidence — ask the user before auto-routing)."
+      recommended_action: "Orchestrator hint: 'process' (go straight to resolve_*_destination), 'ask_user' (media_type unknown or road=path_of_pain — confirm with the user first), or 'skip' (no main video, or media_type=other — nothing to organize)."
@@ -61,6 +61,17 @@ parameters:
      one.
    example: Oz.1997.1080p.WEBRip.x265-KONTRAST

+  source_path:
+    description: |
+      Absolute path to the release folder on disk. Optional.
+    why_needed: |
+      When provided, the tool runs ffprobe on the main video inside the
+      folder and uses the probe data to fill quality/codec tokens that
+      may be missing from the release name. The enriched tech tokens
+      end up in the destination folder name, so providing source_path
+      gives more accurate names for releases with sparse metadata.
+    example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
+
 returns:
  ok:
    description: Paths resolved unambiguously; ready to move.
@@ -56,6 +56,16 @@ parameters:
      Forces the use case to use this exact folder name and skip detection.
    example: The.Wire.2002.1080p.BluRay.x265-GROUP

+  source_path:
+    description: |
+      Absolute path to the release folder on disk. Optional.
+    why_needed: |
+      When provided, the tool runs ffprobe on the main video inside the
+      folder and uses probe data to fill quality/codec tokens that may
+      be missing from the release name, producing a more accurate
+      destination folder name.
+    example: /downloads/The.Wire.S01-S05.1080p.BluRay.x265-GROUP
+
 returns:
  ok:
    description: Path resolved; ready to move the pack.
@@ -1,82 +0,0 @@
-"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
-
-from __future__ import annotations
-
-from alfred.domain.release.value_objects import ParsedRelease
-from alfred.domain.shared.media import MediaInfo
-
-# Map ffprobe codec names to scene-style codec tokens
-_VIDEO_CODEC_MAP = {
-    "hevc": "x265",
-    "h264": "x264",
-    "h265": "x265",
-    "av1": "AV1",
-    "vp9": "VP9",
-    "mpeg4": "XviD",
-}
-
-# Map ffprobe audio codec names to scene-style tokens
-_AUDIO_CODEC_MAP = {
-    "eac3": "EAC3",
-    "ac3": "AC3",
-    "dts": "DTS",
-    "truehd": "TrueHD",
-    "aac": "AAC",
-    "flac": "FLAC",
-    "opus": "OPUS",
-    "mp3": "MP3",
-    "pcm_s16l": "PCM",
-    "pcm_s24l": "PCM",
-}
-
-# Map channel count to standard layout string
-_CHANNEL_MAP = {
-    8: "7.1",
-    6: "5.1",
-    2: "2.0",
-    1: "1.0",
-}
-
-
-def enrich_from_probe(parsed: ParsedRelease, info: MediaInfo) -> None:
-    """
-    Fill None fields in parsed using data from ffprobe MediaInfo.
-
-    Only overwrites fields that are currently None — token-level values
-    from the release name always take priority.
-    Mutates parsed in place.
-    """
-    if parsed.quality is None and info.resolution:
-        parsed.quality = info.resolution
-
-    if parsed.codec is None and info.video_codec:
-        parsed.codec = _VIDEO_CODEC_MAP.get(
-            info.video_codec.lower(), info.video_codec.upper()
-        )
-
-    if parsed.bit_depth is None and info.video_codec:
-        # ffprobe exposes bit depth via pix_fmt — not in MediaInfo yet, skip for now
-        pass
-
-    # Audio — use the default track, fallback to first
-    default_track = next((t for t in info.audio_tracks if t.is_default), None)
-    track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
-
-    if track:
-        if parsed.audio_codec is None and track.codec:
-            parsed.audio_codec = _AUDIO_CODEC_MAP.get(
-                track.codec.lower(), track.codec.upper()
-            )
-
-        if parsed.audio_channels is None and track.channels:
-            parsed.audio_channels = _CHANNEL_MAP.get(
-                track.channels, f"{track.channels}ch"
-            )
-
-    # Languages — merge ffprobe languages with token-level ones
-    # "und" = undetermined, not useful
-    if info.audio_languages:
-        existing = set(parsed.languages)
-        for lang in info.audio_languages:
-            if lang.lower() != "und" and lang.upper() not in existing:
-                parsed.languages.append(lang)
@@ -4,7 +4,7 @@ import logging
 from pathlib import Path

 from alfred.domain.shared.value_objects import ImdbId
-from alfred.domain.subtitles.entities import SubtitleCandidate
+from alfred.domain.subtitles.entities import SubtitleScanResult
 from alfred.domain.subtitles.services.identifier import SubtitleIdentifier
 from alfred.domain.subtitles.services.matcher import SubtitleMatcher
 from alfred.domain.subtitles.services.pattern_detector import PatternDetector
@@ -278,7 +278,7 @@ class ManageSubtitlesUseCase:


 def _to_unresolved_dto(
-    track: SubtitleCandidate, min_confidence: float = 0.7
+    track: SubtitleScanResult, min_confidence: float = 0.7
 ) -> UnresolvedTrack:
    reason = "unknown_language" if track.language is None else "low_confidence"
    return UnresolvedTrack(
@@ -291,10 +291,10 @@ def _to_unresolved_dto(

 def _pair_placed_with_tracks(
    placed: list[PlacedTrack],
-    tracks: list[SubtitleCandidate],
-) -> list[tuple[PlacedTrack, SubtitleCandidate]]:
+    tracks: list[SubtitleScanResult],
+) -> list[tuple[PlacedTrack, SubtitleScanResult]]:
    """
-    Pair each PlacedTrack with its originating SubtitleCandidate by source path.
+    Pair each PlacedTrack with its originating SubtitleScanResult by source path.
    Falls back to positional matching if paths don't align.
    """
    track_by_path = {t.file_path: t for t in tracks if t.file_path}
@@ -8,34 +8,58 @@ Four distinct use cases, one per release type:
 - resolve_series_destination    : complete series multi-season pack (folder move)

 Each returns a dedicated DTO with only the fields that make sense for that type.
+
+These use cases follow Option B of the snapshot-VO design: ``ParsedRelease``
+arrives with ``title_sanitized`` already computed, and TMDB-supplied strings
+are sanitized **at the use-case boundary** (here) before being passed into
+``ParsedRelease`` builder methods. The builders themselves perform no I/O and
+no sanitization.
 """

 from __future__ import annotations

 import logging
-import re
 from dataclasses import dataclass
 from pathlib import Path

+from alfred.application.release import inspect_release
 from alfred.domain.release import parse_release
+from alfred.domain.release.ports import ReleaseKnowledge
+from alfred.domain.release.value_objects import ParsedRelease
+from alfred.domain.shared.ports import MediaProber
 from alfred.infrastructure.persistence import get_memory

 logger = logging.getLogger(__name__)

-_WIN_FORBIDDEN = re.compile(r'[?:*"<>|\\]')

+def _resolve_parsed(
+    release_name: str,
+    source_path: str | None,
+    kb: ReleaseKnowledge,
+    prober: MediaProber,
+) -> ParsedRelease:
+    """Pick the right entry point depending on whether we have a path.

-def _sanitize(text: str) -> str:
-    return _WIN_FORBIDDEN.sub("", text)
+    When ``source_path`` is provided and points to something that exists,
+    we run the full inspection pipeline so probe data can refresh tech
+    fields (which feed every filename builder). Otherwise we fall back
+    to a parse-only path — same behavior as before.
+    """
+    if source_path:
+        path = Path(source_path)
+        if path.exists():
+            return inspect_release(release_name, path, kb, prober).parsed
+    parsed, _ = parse_release(release_name, kb)
+    return parsed


 def _find_existing_tvshow_folders(
-    tv_root: Path, tmdb_title: str, tmdb_year: int
+    tv_root: Path, tmdb_title_safe: str, tmdb_year: int
 ) -> list[str]:
    """Return folder names in tv_root that match title + year prefix."""
    if not tv_root.exists():
        return []
-    clean_title = _sanitize(tmdb_title).replace(" ", ".")
+    clean_title = tmdb_title_safe.replace(" ", ".")
    prefix = f"{clean_title}.{tmdb_year}".lower()
    return sorted(
        entry.name
@@ -66,6 +90,7 @@ class _Clarification:
 def _resolve_series_folder(
    tv_root: Path,
    tmdb_title: str,
+    tmdb_title_safe: str,
    tmdb_year: int,
    computed_name: str,
    confirmed_folder: str | None,
@@ -80,7 +105,7 @@ def _resolve_series_folder(
    if confirmed_folder:
        return confirmed_folder, not (tv_root / confirmed_folder).exists()

-    existing = _find_existing_tvshow_folders(tv_root, tmdb_title, tmdb_year)
+    existing = _find_existing_tvshow_folders(tv_root, tmdb_title_safe, tmdb_year)

    if not existing:
        return computed_name, True
@@ -230,13 +255,20 @@ def resolve_season_destination(
    release_name: str,
    tmdb_title: str,
    tmdb_year: int,
+    kb: ReleaseKnowledge,
+    prober: MediaProber,
    confirmed_folder: str | None = None,
+    source_path: str | None = None,
 ) -> ResolvedSeasonDestination:
    """
    Compute destination paths for a season pack.

    Returns series_folder + season_folder. No file paths — the whole
    source folder is moved as-is into season_folder.
+
+    When ``source_path`` points to the release on disk, the parser is
+    augmented with ffprobe data so tech tokens missing from the release
+    name (quality / codec) end up in the folder names.
    """
    tv_root = _get_tv_root()
    if not tv_root:
@@ -246,11 +278,12 @@ def resolve_season_destination(
            message="TV show library path is not configured.",
        )

-    parsed = parse_release(release_name)
-    computed_name = _sanitize(parsed.show_folder_name(tmdb_title, tmdb_year))
+    parsed = _resolve_parsed(release_name, source_path, kb, prober)
+    tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
+    computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)

    resolved = _resolve_series_folder(
-        tv_root, tmdb_title, tmdb_year, computed_name, confirmed_folder
+        tv_root, tmdb_title, tmdb_title_safe, tmdb_year, computed_name, confirmed_folder
    )
    if isinstance(resolved, _Clarification):
        return ResolvedSeasonDestination(
@@ -279,6 +312,8 @@ def resolve_episode_destination(
    source_file: str,
    tmdb_title: str,
    tmdb_year: int,
+    kb: ReleaseKnowledge,
+    prober: MediaProber,
    tmdb_episode_title: str | None = None,
    confirmed_folder: str | None = None,
 ) -> ResolvedEpisodeDestination:
@@ -286,6 +321,8 @@ def resolve_episode_destination(
    Compute destination paths for a single episode file.

    Returns series_folder + season_folder + library_file (full path to .mkv).
+    ``source_file`` doubles as the inspection target — when it exists,
+    ffprobe enrichment refreshes tech tokens missing from the release name.
    """
    tv_root = _get_tv_root()
    if not tv_root:
@@ -295,12 +332,16 @@ def resolve_episode_destination(
            message="TV show library path is not configured.",
        )

-    parsed = parse_release(release_name)
+    parsed = _resolve_parsed(release_name, source_file, kb, prober)
    ext = Path(source_file).suffix
-    computed_name = _sanitize(parsed.show_folder_name(tmdb_title, tmdb_year))
+    tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
+    tmdb_episode_title_safe = (
+        kb.sanitize_for_fs(tmdb_episode_title) if tmdb_episode_title else None
+    )
+    computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)

    resolved = _resolve_series_folder(
-        tv_root, tmdb_title, tmdb_year, computed_name, confirmed_folder
+        tv_root, tmdb_title, tmdb_title_safe, tmdb_year, computed_name, confirmed_folder
    )
    if isinstance(resolved, _Clarification):
        return ResolvedEpisodeDestination(
@@ -311,7 +352,7 @@ def resolve_episode_destination(

    series_folder_name, is_new = resolved
    season_folder_name = parsed.season_folder_name()
-    filename = _sanitize(parsed.episode_filename(tmdb_episode_title, ext))
+    filename = parsed.episode_filename(tmdb_episode_title_safe, ext)

    series_path = tv_root / series_folder_name
    season_path = series_path / season_folder_name
@@ -334,11 +375,15 @@ def resolve_movie_destination(
    source_file: str,
    tmdb_title: str,
    tmdb_year: int,
+    kb: ReleaseKnowledge,
+    prober: MediaProber,
 ) -> ResolvedMovieDestination:
    """
    Compute destination paths for a movie file.

    Returns movie_folder + library_file (full path to .mkv).
+    ``source_file`` doubles as the inspection target — when it exists,
+    ffprobe enrichment refreshes tech tokens missing from the release name.
    """
    memory = get_memory()
    movies_root = memory.ltm.library_paths.get("movie")
@@ -349,11 +394,12 @@ def resolve_movie_destination(
            message="Movie library path is not configured.",
        )

-    parsed = parse_release(release_name)
+    parsed = _resolve_parsed(release_name, source_file, kb, prober)
    ext = Path(source_file).suffix
+    tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)

-    folder_name = _sanitize(parsed.movie_folder_name(tmdb_title, tmdb_year))
-    filename = _sanitize(parsed.movie_filename(tmdb_title, tmdb_year, ext))
+    folder_name = parsed.movie_folder_name(tmdb_title_safe, tmdb_year)
+    filename = parsed.movie_filename(tmdb_title_safe, tmdb_year, ext)

    folder_path = Path(movies_root) / folder_name
    file_path = folder_path / filename
@@ -372,12 +418,18 @@ def resolve_series_destination(
    release_name: str,
    tmdb_title: str,
    tmdb_year: int,
+    kb: ReleaseKnowledge,
+    prober: MediaProber,
    confirmed_folder: str | None = None,
+    source_path: str | None = None,
 ) -> ResolvedSeriesDestination:
    """
    Compute destination path for a complete multi-season series pack.

    Returns only series_folder — the whole pack lands directly inside it.
+
+    When ``source_path`` points to the release on disk, ffprobe
+    enrichment refreshes tech tokens missing from the release name.
    """
    tv_root = _get_tv_root()
    if not tv_root:
@@ -387,11 +439,12 @@ def resolve_series_destination(
            message="TV show library path is not configured.",
        )

-    parsed = parse_release(release_name)
-    computed_name = _sanitize(parsed.show_folder_name(tmdb_title, tmdb_year))
+    parsed = _resolve_parsed(release_name, source_path, kb, prober)
+    tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
+    computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)

    resolved = _resolve_series_folder(
-        tv_root, tmdb_title, tmdb_year, computed_name, confirmed_folder
+        tv_root, tmdb_title, tmdb_title_safe, tmdb_year, computed_name, confirmed_folder
    )
    if isinstance(resolved, _Clarification):
        return ResolvedSeriesDestination(
@@ -0,0 +1,20 @@
+"""Release application layer — orchestrators sitting between domain
+parsing and infrastructure I/O.
+
+Public surface:
+
+- :func:`is_supported_video` / :func:`find_main_video` — pre-pipeline
+  filesystem helpers (extension-only filtering, top-level video pick).
+- :func:`inspect_release` / :class:`InspectedResult` — full inspection
+  pipeline combining parse + filesystem refinement + probe enrichment.
+"""
+
+from .inspect import InspectedResult, inspect_release
+from .supported_media import find_main_video, is_supported_video
+
+__all__ = [
+    "InspectedResult",
+    "find_main_video",
+    "inspect_release",
+    "is_supported_video",
+]
@@ -19,15 +19,13 @@ from __future__ import annotations

 from pathlib import Path

-from alfred.domain.release.value_objects import (
-    _METADATA_EXTENSIONS,
-    _NON_VIDEO_EXTENSIONS,
-    _VIDEO_EXTENSIONS,
-    ParsedRelease,
-)
+from alfred.domain.release.ports import ReleaseKnowledge
+from alfred.domain.release.value_objects import ParsedRelease


-def detect_media_type(parsed: ParsedRelease, source_path: Path) -> str:
+def detect_media_type(
+    parsed: ParsedRelease, source_path: Path, kb: ReleaseKnowledge
+) -> str:
    """
    Return a refined media_type string for the given source_path.

@@ -37,10 +35,10 @@ def detect_media_type(parsed: ParsedRelease, source_path: Path) -> str:
    extensions = _collect_extensions(source_path)
    # Metadata extensions (.nfo, .srt, …) are always present alongside releases
    # and must not influence the type decision.
-    conclusive = extensions - _METADATA_EXTENSIONS
+    conclusive = extensions - kb.metadata_extensions

-    has_video = bool(conclusive & _VIDEO_EXTENSIONS)
-    has_non_video = bool(conclusive & _NON_VIDEO_EXTENSIONS)
+    has_video = bool(conclusive & kb.video_extensions)
+    has_non_video = bool(conclusive & kb.non_video_extensions)

    if has_video and has_non_video:
        return "unknown"
@@ -0,0 +1,74 @@
+"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
+
+from __future__ import annotations
+
+from dataclasses import replace
+
+from alfred.domain.release.ports import ReleaseKnowledge
+from alfred.domain.release.value_objects import ParsedRelease
+from alfred.domain.shared.media import MediaInfo
+
+
+def enrich_from_probe(
+    parsed: ParsedRelease, info: MediaInfo, kb: ReleaseKnowledge
+) -> ParsedRelease:
+    """
+    Return a new ParsedRelease with None fields filled from ffprobe MediaInfo.
+
+    Only overwrites fields that are currently None — token-level values
+    from the release name always take priority. ``ParsedRelease`` is
+    frozen; this returns a new instance via :func:`dataclasses.replace`.
+
+    Translation tables (ffprobe codec name → scene token, channel count
+    → layout) live in ``kb.probe_mappings`` (loaded from
+    ``alfred/knowledge/release/probe_mappings.yaml``). When ffprobe
+    reports a value with no mapping entry, the fallback is the uppercase
+    raw value so unknown codecs still surface in a predictable form.
+    """
+    mappings = kb.probe_mappings
+    video_codec_map: dict[str, str] = mappings.get("video_codec", {})
+    audio_codec_map: dict[str, str] = mappings.get("audio_codec", {})
+    channel_map: dict[int, str] = mappings.get("audio_channels", {})
+
+    updates: dict[str, object] = {}
+
+    if parsed.quality is None and info.resolution:
+        updates["quality"] = info.resolution
+
+    if parsed.codec is None and info.video_codec:
+        updates["codec"] = video_codec_map.get(
+            info.video_codec.lower(), info.video_codec.upper()
+        )
+
+    # bit_depth: ffprobe exposes it via pix_fmt — not in MediaInfo yet, skip.
+
+    # Audio — use the default track, fallback to first
+    default_track = next((t for t in info.audio_tracks if t.is_default), None)
+    track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
+
+    if track:
+        if parsed.audio_codec is None and track.codec:
+            updates["audio_codec"] = audio_codec_map.get(
+                track.codec.lower(), track.codec.upper()
+            )
+
+        if parsed.audio_channels is None and track.channels:
+            updates["audio_channels"] = channel_map.get(
+                track.channels, f"{track.channels}ch"
+            )
+
+    # Languages — merge ffprobe languages with token-level ones
+    # "und" = undetermined, not useful
+    if info.audio_languages:
+        existing_upper = {lang.upper() for lang in parsed.languages}
+        new_languages = list(parsed.languages)
+        for lang in info.audio_languages:
+            if lang.lower() != "und" and lang.upper() not in existing_upper:
+                new_languages.append(lang)
+                existing_upper.add(lang.upper())
+        if len(new_languages) != len(parsed.languages):
+            updates["languages"] = tuple(new_languages)
+
+    if not updates:
+        return parsed
+    return replace(parsed, **updates)
@@ -0,0 +1,193 @@
+"""Release inspection orchestrator — the canonical "look at this thing"
+entry point.
+
+``inspect_release`` is the single composition of the four layers we
+care about for a freshly-arrived release:
+
+1. **Parse the name** — :func:`alfred.domain.release.services.parse_release`
+   gives a ``ParsedRelease`` plus a ``ParseReport`` (confidence + road).
+2. **Pick the main video** — :func:`find_main_video` runs a top-level
+   scan over the source path. If nothing qualifies the result still
+   completes; downstream callers decide what to do with a videoless
+   release.
+3. **Refine the media type** — :func:`detect_media_type` uses the
+   on-disk extension mix to override any token-level guess (e.g. a
+   bare ``.iso`` folder becomes ``"other"``). The refined value is
+   patched onto ``parsed`` in place — same convention as
+   ``analyze_release`` had before.
+4. **Probe the video** — the injected :class:`MediaProber` fills in
+   missing technical fields via :func:`enrich_from_probe`. Skipped
+   when there is no main video or when ``media_type`` ended up in
+   ``{"unknown", "other"}`` (the probe would tell us nothing useful).
+
+The return type is :class:`InspectedResult`, a frozen VO that bundles
+everything downstream callers need (``analyze_release`` tool,
+``resolve_destination``, future workflow stages) without forcing them
+to redo the same four calls.
+
+Design notes:
+
+- **Application layer.** This module touches both domain
+  (``parse_release``) and infrastructure (``MediaProber`` port). That
+  is exactly application's job — orchestrate.
+- **Knowledge base is injected.** ``inspect_release`` takes ``kb`` and
+  ``prober`` as parameters; no module-level singletons here. Callers
+  (the tool wrapper, tests) decide what to plug in.
+- **Mutation is contained.** We still mutate ``parsed.media_type`` and
+  let ``enrich_from_probe`` fill its ``None`` fields, because
+  ``ParsedRelease`` is intentionally a mutable dataclass. The outer
+  ``InspectedResult`` is frozen so the *bundle* is immutable from the
+  caller's perspective.
+- **Never raises.** Filesystem / probe errors surface as ``None``
+  fields on the result, never as exceptions — same contract as the
+  underlying adapters.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, replace
+from pathlib import Path
+
+from alfred.application.release.detect_media_type import detect_media_type
+from alfred.application.release.enrich_from_probe import enrich_from_probe
+from alfred.application.release.supported_media import find_main_video
+from alfred.domain.release.ports import ReleaseKnowledge
+from alfred.domain.release.services import parse_release
+from alfred.domain.release.value_objects import (
+    MediaTypeToken,
+    ParsedRelease,
+    ParseReport,
+)
+from alfred.domain.shared.media import MediaInfo
+from alfred.domain.shared.ports import MediaProber
+
+
+# Media types for which a probe carries no useful information.
+_NON_PROBABLE_MEDIA_TYPES = frozenset({"unknown", "other"})
+
+# Media types for which there's nothing for the organizer to do.
+# ``other`` covers things like games / ISOs / archives sitting on the
+# downloads folder. ``unknown`` does NOT belong here — those need a
+# user decision, not a skip.
+_SKIPPABLE_MEDIA_TYPES = frozenset({"other"})
+
+# Roads that signal the parser couldn't reach a confident answer on its
+# own. ``Road`` values are kept as strings on the report to avoid a
+# cross-package import here.
+_ASK_USER_ROADS = frozenset({"path_of_pain"})
+
+
+@dataclass(frozen=True)
+class InspectedResult:
+    """The full picture of a release: parsed name + filesystem reality.
+
+    Bundles everything the downstream pipeline needs after a single
+    inspection pass:
+
+    - ``parsed`` — :class:`ParsedRelease`, with ``media_type`` already
+      refined by :func:`detect_media_type` and ``None`` tech fields
+      filled in by :func:`enrich_from_probe` when a probe ran.
+    - ``report`` — :class:`ParseReport` from the parser (confidence +
+      road, untouched by inspection).
+    - ``source_path`` — the path the inspector was pointed at (file or
+      folder), as supplied by the caller.
+    - ``main_video`` — the canonical video file inside ``source_path``,
+      or ``None`` if no eligible file was found.
+    - ``media_info`` — the :class:`MediaInfo` snapshot when a probe
+      succeeded; ``None`` when no video was probed (no main video, or
+      ``media_type`` in ``{"unknown", "other"}``) or when ffprobe
+      failed.
+    - ``probe_used`` — ``True`` iff ``media_info`` is non-``None`` and
+      ``enrich_from_probe`` actually ran. Explicit flag so callers
+      don't have to re-derive the condition.
+    - ``recommended_action`` — derived hint for the orchestrator (see
+      property docstring). Encodes the exclusion / clarification /
+      go-ahead decision in one place so downstream callers don't
+      re-implement the same checks.
+    """
+
+    parsed: ParsedRelease
+    report: ParseReport
+    source_path: Path
+    main_video: Path | None
+    media_info: MediaInfo | None
+    probe_used: bool
+
+    @property
+    def recommended_action(self) -> str:
+        """Return one of ``"skip"`` / ``"ask_user"`` / ``"process"``.
+
+        - ``"skip"`` — nothing to organize:
+            * the source has no main video file, **or**
+            * ``media_type`` is ``"other"`` (games / ISOs / archives).
+        - ``"ask_user"`` — a decision is required before any action:
+            * ``media_type`` is ``"unknown"`` (parser couldn't classify), **or**
+            * the parse landed on ``Road.PATH_OF_PAIN``
+              (low-confidence, malformed name, etc.).
+        - ``"process"`` — everything else: a confident parse with a
+          usable media type and a main video on disk. The orchestrator
+          can move straight to the planning step.
+
+        The check ordering matters: ``"skip"`` wins over ``"ask_user"``
+        because if there's no video to organize, no question to the
+        user can change that. ``"ask_user"`` then wins over
+        ``"process"`` because a confident parse alone isn't enough if
+        the type or road still flag uncertainty.
+        """
+        if self.main_video is None:
+            return "skip"
+        if self.parsed.media_type.value in _SKIPPABLE_MEDIA_TYPES:
+            return "skip"
+        if self.parsed.media_type.value == "unknown":
+            return "ask_user"
+        if self.report.road in _ASK_USER_ROADS:
+            return "ask_user"
+        return "process"
+
+
+def inspect_release(
+    release_name: str,
+    source_path: Path,
+    kb: ReleaseKnowledge,
+    prober: MediaProber,
+) -> InspectedResult:
+    """Run the full inspection pipeline on ``release_name`` /
+    ``source_path``.
+
+    See module docstring for the four-step flow. ``kb`` and ``prober``
+    are injected so the caller controls the knowledge base layering
+    and the probe adapter (real ffprobe in production, stubs in tests).
+
+    Never raises. A missing or unreadable ``source_path`` simply
+    results in ``main_video=None`` and ``media_info=None``.
+    """
+    parsed, report = parse_release(release_name, kb)
+
+    # Step 2: refine media_type from the on-disk extension mix.
+    # detect_media_type tolerates non-existent paths (returns parsed.media_type
+    # untouched), so no need to guard here. ParsedRelease is frozen — use
+    # dataclasses.replace to rebind with the refined value.
+    refined_media_type = MediaTypeToken(detect_media_type(parsed, source_path, kb))
+    if refined_media_type != parsed.media_type:
+        parsed = replace(parsed, media_type=refined_media_type)
+
+    # Step 3: pick the canonical main video (top-level scan only).
+    main_video = find_main_video(source_path, kb)
+
+    # Step 4: probe + enrich, when it makes sense.
+    media_info: MediaInfo | None = None
+    probe_used = False
+    if main_video is not None and parsed.media_type not in _NON_PROBABLE_MEDIA_TYPES:
+        media_info = prober.probe(main_video)
+        if media_info is not None:
+            parsed = enrich_from_probe(parsed, media_info, kb)
+            probe_used = True
+
+    return InspectedResult(
+        parsed=parsed,
+        report=report,
+        source_path=source_path,
+        main_video=main_video,
+        media_info=media_info,
+        probe_used=probe_used,
+    )
@@ -0,0 +1,74 @@
+"""Pre-pipeline exclusion — decide which files are worth parsing.
+
+These helpers live one notch above the domain: they touch the
+filesystem (``Path.iterdir``, ``Path.suffix``) but carry no parsing
+logic of their own. The goal is to filter out non-video files and pick
+the canonical "main video" from a release folder *before* anything
+hits :func:`~alfred.domain.release.parse_release`.
+
+Design notes (Phase A bis, 2026-05-20):
+
+- **Extension is the sole eligibility criterion.** A file is supported
+  iff its suffix is in ``kb.video_extensions``. No size threshold, no
+  filename heuristics ("sample", "trailer", …). If a release packs a
+  bloated featurette or names its sample alphabetically before the
+  main feature, that's PATH_OF_PAIN territory — not this layer's job.
+
+- **Top-level scan only.** ``find_main_video`` does not descend into
+  subdirectories. Releases that wrap the main video in ``Sample/`` or
+  similar are non-scene-standard and handled by the orchestrator
+  upstream.
+
+- **Lexicographic tie-break.** When several candidates qualify
+  (legitimate for season packs), we return the first by alphabetical
+  order. Deterministic, no size-based ranking.
+
+- **Direct ``Path`` I/O.** No ``FilesystemScanner`` port — this layer
+  is application, not domain. If isolation becomes necessary for
+  testing scale, we'll introduce a port then.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from alfred.domain.release.ports.knowledge import ReleaseKnowledge
+
+
+def is_supported_video(path: Path, kb: ReleaseKnowledge) -> bool:
+    """Return True when ``path`` is a video file the parser should
+    consider.
+
+    The check is purely extension-based: ``path.suffix.lower()`` must
+    belong to ``kb.video_extensions``. ``path`` must also be a regular
+    file — directories and broken symlinks return False.
+    """
+    if not path.is_file():
+        return False
+    return path.suffix.lower() in kb.video_extensions
+
+
+def find_main_video(folder: Path, kb: ReleaseKnowledge) -> Path | None:
+    """Return the canonical main video file inside ``folder``, or
+    ``None`` if there isn't one.
+
+    Behavior:
+
+    - Top-level scan only — subdirectories are ignored.
+    - Eligibility is :func:`is_supported_video`.
+    - When several files qualify, the lexicographically first one wins.
+    - When ``folder`` itself is a video file, it is returned as-is
+      (single-file releases are valid).
+    - When ``folder`` doesn't exist or isn't a directory (and isn't a
+      video file either), returns ``None``.
+    """
+    if folder.is_file():
+        return folder if is_supported_video(folder, kb) else None
+
+    if not folder.is_dir():
+        return None
+
+    candidates = sorted(
+        child for child in folder.iterdir() if is_supported_video(child, kb)
+    )
+    return candidates[0] if candidates else None
@@ -5,13 +5,13 @@ import os
 from dataclasses import dataclass
 from pathlib import Path

-from alfred.domain.subtitles.entities import SubtitleCandidate
+from alfred.domain.subtitles.entities import SubtitleScanResult
 from alfred.domain.subtitles.value_objects import SubtitleType

 logger = logging.getLogger(__name__)


-def _build_dest_name(track: SubtitleCandidate, video_stem: str) -> str:
+def _build_dest_name(track: SubtitleScanResult, video_stem: str) -> str:
    """
    Build the destination filename for a subtitle track.

@@ -41,7 +41,7 @@ class PlacedTrack:
@dataclass
 class PlaceResult:
    placed: list[PlacedTrack]
-    skipped: list[tuple[SubtitleCandidate, str]]  # (track, reason)
+    skipped: list[tuple[SubtitleScanResult, str]]  # (track, reason)

    @property
    def placed_count(self) -> int:
@@ -54,7 +54,7 @@ class PlaceResult:

 class SubtitlePlacer:
    """
-    Hard-links matched SubtitleCandidate files next to a destination video.
+    Hard-links matched SubtitleScanResult files next to a destination video.

    Uses the same hard-link strategy as FileManager.copy_file:
    instant, no data duplication, qBittorrent keeps seeding.
@@ -64,11 +64,11 @@ class SubtitlePlacer:

    def place(
        self,
-        tracks: list[SubtitleCandidate],
+        tracks: list[SubtitleScanResult],
        destination_video: Path,
    ) -> PlaceResult:
        placed: list[PlacedTrack] = []
-        skipped: list[tuple[SubtitleCandidate, str]] = []
+        skipped: list[tuple[SubtitleScanResult, str]] = []

        dest_dir = destination_video.parent

@@ -8,19 +8,22 @@ from ..shared.value_objects import FilePath, FileSize, ImdbId
 from .value_objects import MovieTitle, Quality, ReleaseYear


-@dataclass(eq=False)
+@dataclass(frozen=True, eq=False)
 class Movie(MediaWithTracks):
    """
    Movie aggregate root for the movies domain.

    Carries file metadata (path, size) and the tracks discovered by the
-    ffprobe + subtitle scan pipeline. The track lists may be empty when the
+    ffprobe + subtitle scan pipeline. The track tuples may be empty when the
    movie is known but not yet scanned, or when no file is downloaded.

    Track helpers follow the same "C+" contract as ``Episode``: pass a
    ``Language`` for cross-format matching, or a ``str`` for case-insensitive
    direct comparison.

+    Frozen: rebuild via ``dataclasses.replace`` to project enrichment results
+    (audio/subtitle tracks, file metadata) onto a new instance.
+
    Equality is identity-based: two ``Movie`` instances are equal iff they
    share the same ``imdb_id``, regardless of file/track contents. This is
    the DDD aggregate invariant — the aggregate is identified by its root id.
@@ -34,15 +37,15 @@ class Movie(MediaWithTracks):
    file_size: FileSize | None = None
    tmdb_id: int | None = None
    added_at: datetime = field(default_factory=datetime.now)
-    audio_tracks: list[AudioTrack] = field(default_factory=list)
-    subtitle_tracks: list[SubtitleTrack] = field(default_factory=list)
+    audio_tracks: tuple[AudioTrack, ...] = field(default_factory=tuple)
+    subtitle_tracks: tuple[SubtitleTrack, ...] = field(default_factory=tuple)

    def __post_init__(self):
        """Validate movie entity."""
        # Ensure ImdbId is actually an ImdbId instance
        if not isinstance(self.imdb_id, ImdbId):
            if isinstance(self.imdb_id, str):
-                self.imdb_id = ImdbId(self.imdb_id)
+                object.__setattr__(self, "imdb_id", ImdbId(self.imdb_id))
            else:
                raise ValueError(
                    f"imdb_id must be ImdbId or str, got {type(self.imdb_id)}"
@@ -51,7 +54,7 @@ class Movie(MediaWithTracks):
        # Ensure MovieTitle is actually a MovieTitle instance
        if not isinstance(self.title, MovieTitle):
            if isinstance(self.title, str):
-                self.title = MovieTitle(self.title)
+                object.__setattr__(self, "title", MovieTitle(self.title))
            else:
                raise ValueError(
                    f"title must be MovieTitle or str, got {type(self.title)}"
@@ -1,6 +1,6 @@
 """Release domain — release name parsing and naming conventions."""

 from .services import parse_release
-from .value_objects import ParsedRelease
+from .value_objects import ParsedRelease, ParseReport

-__all__ = ["ParsedRelease", "parse_release"]
+__all__ = ["ParsedRelease", "ParseReport", "parse_release"]
@@ -0,0 +1,31 @@
+"""Release parser v2 — annotate-based pipeline.
+
+This package is the future home of ``parse_release``. It restructures the
+parsing logic around a **tokenize → annotate → assemble** pipeline:
+
+1. **tokenize**: split the release name into atomic tokens.
+2. **annotate**: walk tokens left-to-right, assigning each one a
+   :class:`TokenRole` (TITLE, YEAR, SEASON, RESOLUTION, …) using the
+   injected :class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge`.
+3. **assemble**: fold the annotated tokens into a :class:`ParsedRelease`.
+
+The pipeline has three internal paths driven by the detected release group:
+
+- **EASY**: known group (KONTRAST, RARBG, …) with a schema-driven layout
+  declared in ``knowledge/release/release_groups/<group>.yaml``.
+- **SHITTY**: unknown group, best-effort matching against the global
+  knowledge sets, with a 0-100 confidence score.
+- **PATH OF PAIN**: score below threshold OR critical chunks missing —
+  signaled to the caller, who decides whether to involve the LLM/user.
+
+Today the package exposes scaffolding only (token VOs and a thin pipeline
+stub). The legacy ``parse_release`` in ``release.services`` keeps serving
+production until each piece of the v2 pipeline is wired in.
+"""
+
+from __future__ import annotations
+
+from .schema import GroupSchema, SchemaChunk
+from .tokens import Token, TokenRole
+
+__all__ = ["GroupSchema", "SchemaChunk", "Token", "TokenRole"]
@@ -0,0 +1,763 @@
+"""Annotate-based pipeline.
+
+Three stages:
+
+1. :func:`tokenize` — release name → ``list[Token]`` (all UNKNOWN), plus
+   a separately-returned site tag (e.g. ``[YTS.MX]``) that is never
+   tokenized.
+2. :func:`annotate` — promote each token's :class:`TokenRole` using the
+   injected knowledge base. Two sub-passes:
+
+     a. **Structural** (schema-driven, EASY only). Detects the group at
+        the right end, looks up its :class:`GroupSchema`, then matches
+        the schema's chunk sequence against the token stream. Between
+        two structural chunks, any number of unmatched tokens may
+        remain — they are left UNKNOWN for the enricher pass to handle.
+     b. **Enrichers** (non-positional). Walks UNKNOWN tokens and tags
+        audio / video-meta / edition / language roles. Multi-token
+        sequences (``DTS.HD.MA``, ``DV.HDR10``, ``DIRECTORS.CUT``) are
+        matched first, single tokens after.
+
+3. :func:`assemble` — fold annotated tokens into a
+   :class:`~alfred.domain.release.value_objects.ParsedRelease`-compatible
+   dict.
+
+The pipeline is **pure**: no I/O, no TMDB, no probe. All knowledge
+arrives through ``kb: ReleaseKnowledge``.
+"""
+
+from __future__ import annotations
+
+from ..ports.knowledge import ReleaseKnowledge
+from ..value_objects import MediaTypeToken
+from .schema import GroupSchema
+from .tokens import Token, TokenRole
+
+
+# ---------------------------------------------------------------------------
+# Stage 1 — tokenize
+# ---------------------------------------------------------------------------
+
+
+def strip_site_tag(name: str) -> tuple[str, str | None]:
+    """Split off a ``[site.tag]`` prefix or suffix.
+
+    Returns ``(clean_name, tag)``. If no tag is found, returns
+    ``(name.strip(), None)``.
+    """
+    s = name.strip()
+
+    if s.startswith("["):
+        close = s.find("]")
+        if close != -1:
+            tag = s[1:close].strip()
+            remainder = s[close + 1 :].strip()
+            if tag and remainder:
+                return remainder, tag
+
+    if s.endswith("]"):
+        open_bracket = s.rfind("[")
+        if open_bracket != -1:
+            tag = s[open_bracket + 1 : -1].strip()
+            remainder = s[:open_bracket].strip()
+            if tag and remainder:
+                return remainder, tag
+
+    return s, None
+
+
+def tokenize(name: str, kb: ReleaseKnowledge) -> tuple[list[Token], str | None]:
+    """Split ``name`` into tokens after stripping any site tag.
+
+    String-ops style: replace every configured separator with a single
+    NUL byte then split. NUL cannot legally appear in a release name, so
+    it's a safe sentinel.
+    """
+    clean, site_tag = strip_site_tag(name)
+
+    DELIM = "\x00"
+    buf = clean
+    for sep in kb.separators:
+        if sep != DELIM:
+            buf = buf.replace(sep, DELIM)
+
+    pieces = [p for p in buf.split(DELIM) if p]
+    tokens = [Token(text=p, index=i) for i, p in enumerate(pieces)]
+    return tokens, site_tag
+
+
+# ---------------------------------------------------------------------------
+# Helpers shared across passes
+# ---------------------------------------------------------------------------
+
+
+def _parse_season_episode(text: str) -> tuple[int, int | None, int | None] | None:
+    """Parse a single token as ``SxxExx`` / ``SxxExxExx`` / ``Sxx`` /
+    ``Sxx-yy`` (season range) / ``NxNN``.
+
+    Returns ``(season, episode, episode_end)`` or ``None`` if the token
+    is not a season/episode marker. For ``Sxx-yy``, returns the first
+    season with no episode info — the caller is expected to detect the
+    range form and promote ``media_type`` to ``tv_complete`` separately.
+    """
+    upper = text.upper()
+
+    # SxxExx form (and Sxx, Sxx-yy)
+    if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
+        season = int(upper[1:3])
+        rest = upper[3:]
+
+        if not rest:
+            return season, None, None
+
+        # Sxx-yy season-range form: capture the first season, treat as a
+        # complete-series marker (no episode info).
+        if (
+            len(rest) == 3
+            and rest[0] == "-"
+            and rest[1:3].isdigit()
+        ):
+            return season, None, None
+
+        episodes: list[int] = []
+        while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
+            episodes.append(int(rest[1:3]))
+            rest = rest[3:]
+
+        if not episodes:
+            return None
+        # For chained multi-episode markers (E09E10E11), the range is the
+        # first → last episode. Intermediate values are implied.
+        return season, episodes[0], episodes[-1] if len(episodes) >= 2 else None
+
+    # NxNN form
+    if "X" in upper:
+        parts = upper.split("X")
+        if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
+            season = int(parts[0])
+            episode = int(parts[1])
+            episode_end = int(parts[2]) if len(parts) >= 3 else None
+            return season, episode, episode_end
+
+    return None
+
+
+def _is_year(text: str) -> bool:
+    """Return True if ``text`` is a 4-digit year in [1900, 2099]."""
+    return len(text) == 4 and text.isdigit() and 1900 <= int(text) <= 2099
+
+
+def _split_codec_group(text: str, kb: ReleaseKnowledge) -> tuple[str, str] | None:
+    """Split a ``codec-GROUP`` token into ``(codec, group)`` if it fits.
+
+    Returns ``None`` if the token doesn't match the ``codec-GROUP``
+    shape. Handles the empty-group case (``x265-``) as ``(codec, "")``.
+    """
+    if "-" not in text:
+        return None
+    head, _, tail = text.rpartition("-")
+    if head.lower() in kb.codecs:
+        return head, tail
+    return None
+
+
+def _match_role(text: str, role: TokenRole, kb: ReleaseKnowledge) -> TokenRole | None:
+    """Return ``role`` if ``text`` matches it under ``kb``, else ``None``."""
+    lower = text.lower()
+
+    if role is TokenRole.YEAR:
+        return TokenRole.YEAR if _is_year(text) else None
+
+    if role is TokenRole.SEASON_EPISODE:
+        return (
+            TokenRole.SEASON_EPISODE
+            if _parse_season_episode(text) is not None
+            else None
+        )
+
+    if role is TokenRole.RESOLUTION:
+        return TokenRole.RESOLUTION if lower in kb.resolutions else None
+
+    if role is TokenRole.SOURCE:
+        return TokenRole.SOURCE if lower in kb.sources else None
+
+    if role is TokenRole.CODEC:
+        return TokenRole.CODEC if lower in kb.codecs else None
+
+    return None
+
+
+# ---------------------------------------------------------------------------
+# Stage 2a — group detection
+# ---------------------------------------------------------------------------
+
+
+def _detect_group(tokens: list[Token], kb: ReleaseKnowledge) -> tuple[str, int | None]:
+    """Identify the release group by walking tokens right-to-left.
+
+    Returns ``(group_name, token_index_carrying_group)``. ``index`` is
+    ``None`` when the group is absent (no trailing ``-`` in the stream).
+    """
+    # Priority 1: codec-GROUP shape (clearest signal).
+    for tok in reversed(tokens):
+        split = _split_codec_group(tok.text, kb)
+        if split is not None:
+            _, group = split
+            return (group or "UNKNOWN"), tok.index
+
+    # Priority 2: rightmost dash, excluding dashed sources (Web-DL, etc.).
+    for tok in reversed(tokens):
+        if "-" not in tok.text:
+            continue
+        head, _, tail = tok.text.rpartition("-")
+        if (
+            head.lower() in kb.sources
+            or tok.text.lower().replace("-", "") in kb.sources
+        ):
+            continue
+        if tail:
+            return tail, tok.index
+
+    return "UNKNOWN", None
+
+
+# ---------------------------------------------------------------------------
+# Stage 2b — structural annotation (schema-driven)
+# ---------------------------------------------------------------------------
+
+
+def _annotate_structural(
+    tokens: list[Token],
+    kb: ReleaseKnowledge,
+    schema: GroupSchema,
+    group_token_index: int,
+) -> list[Token] | None:
+    """Annotate structural tokens following a known group schema.
+
+    Walks the schema's chunks against the body (tokens up to the group
+    token). For each chunk, scans forward in the body for a matching
+    token — tokens passed over without match are left UNKNOWN (the
+    enricher pass will handle them).
+
+    Returns ``None`` if any mandatory chunk fails to find a match.
+    """
+    result = list(tokens)
+
+    # The codec-GROUP token carries CODEC + GROUP. Split it now so the
+    # schema walk knows the codec is "pre-consumed" at the end.
+    group_token = result[group_token_index]
+    cg_split = _split_codec_group(group_token.text, kb)
+    codec_pre_consumed = False
+    if cg_split is not None:
+        codec, group = cg_split
+        result[group_token_index] = group_token.with_role(
+            TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
+        )
+        codec_pre_consumed = True
+    else:
+        head, _, tail = group_token.text.rpartition("-")
+        result[group_token_index] = group_token.with_role(
+            TokenRole.GROUP, group=tail or "UNKNOWN", prefix=head
+        )
+
+    body_end = group_token_index  # exclusive
+    tok_idx = 0
+    chunk_idx = 0
+
+    # 1) TITLE — leftmost contiguous tokens up to the first structural
+    #    boundary. Title is special because it can be multi-token.
+    while (
+        chunk_idx < len(schema.chunks)
+        and schema.chunks[chunk_idx].role is TokenRole.TITLE
+    ):
+        title_end = _find_title_end(result, body_end, kb)
+        for i in range(tok_idx, title_end):
+            result[i] = result[i].with_role(TokenRole.TITLE)
+        tok_idx = title_end
+        chunk_idx += 1
+
+    # 2) Remaining structural chunks. For each, scan forward in the body
+    #    for a matching token; tokens passed over remain UNKNOWN.
+    for chunk in schema.chunks[chunk_idx:]:
+        if chunk.role is TokenRole.GROUP:
+            continue
+        if chunk.role is TokenRole.CODEC and codec_pre_consumed:
+            continue
+
+        match_idx = _find_chunk(result, tok_idx, body_end, chunk.role, kb)
+        if match_idx is None:
+            if chunk.optional:
+                continue
+            return None
+
+        result[match_idx] = result[match_idx].with_role(chunk.role)
+        tok_idx = match_idx + 1
+
+    return result
+
+
+def _find_title_end(
+    tokens: list[Token], body_end: int, kb: ReleaseKnowledge
+) -> int:
+    """Return the exclusive index where the title ends.
+
+    The title is the leftmost run of tokens whose text does not match
+    any structural role (year, season/episode, resolution, source,
+    codec). Enricher tokens (audio, HDR, language) are *not* boundaries
+    because they can appear in the middle of the structural sequence;
+    however, in canonical scene names they don't appear inside the title
+    itself, so this heuristic holds in practice.
+    """
+    for i in range(body_end):
+        text = tokens[i].text
+        if _parse_season_episode(text) is not None:
+            return i
+        if _is_year(text):
+            return i
+        lower = text.lower()
+        if lower in kb.resolutions:
+            return i
+        if lower in kb.sources:
+            return i
+        if lower in kb.codecs:
+            return i
+        # codec-GROUP token (e.g. "x265-KONTRAST") or dashed source (Web-DL).
+        if "-" in text:
+            head, _, _ = text.rpartition("-")
+            if (
+                head.lower() in kb.codecs
+                or head.lower() in kb.sources
+                or text.lower().replace("-", "") in kb.sources
+            ):
+                return i
+    return body_end
+
+
+def _find_chunk(
+    tokens: list[Token],
+    start: int,
+    end: int,
+    role: TokenRole,
+    kb: ReleaseKnowledge,
+) -> int | None:
+    """Return the first index in ``[start, end)`` whose token matches ``role``.
+
+    Returns ``None`` if no token in the range matches. Tokens already
+    annotated (non-UNKNOWN) are skipped — they belong to another chunk.
+    """
+    for i in range(start, end):
+        if tokens[i].role is not TokenRole.UNKNOWN:
+            continue
+        if _match_role(tokens[i].text, role, kb) is not None:
+            return i
+    return None
+
+
+# ---------------------------------------------------------------------------
+# Stage 2b' — SHITTY annotation (schema-less heuristic)
+# ---------------------------------------------------------------------------
+
+
+def _annotate_shitty(
+    tokens: list[Token],
+    kb: ReleaseKnowledge,
+    group_index: int | None,
+) -> list[Token]:
+    """Schema-less, dictionary-driven annotation.
+
+    SHITTY's job is narrow: for releases that *look* like scene names
+    but don't have a registered group schema, tag every token whose text
+    falls into a known YAML bucket (resolutions, codecs, sources, …).
+    Anything we can't classify stays UNKNOWN. The leftmost run of
+    UNKNOWN tokens becomes the title. Done.
+
+    Anything that requires more reasoning (parenthesized tech blocks,
+    bare-dashed title fragments, year-disguised slug suffixes, …) is
+    PATH OF PAIN territory and stays out of here on purpose.
+    """
+    result = list(tokens)
+
+    # 1) Group token — split codec-GROUP or tag GROUP. Same logic as EASY.
+    if group_index is not None:
+        gt = result[group_index]
+        cg_split = _split_codec_group(gt.text, kb)
+        if cg_split is not None:
+            codec, group = cg_split
+            result[group_index] = gt.with_role(
+                TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
+            )
+        else:
+            _, _, tail = gt.text.rpartition("-")
+            result[group_index] = gt.with_role(
+                TokenRole.GROUP, group=tail or "UNKNOWN"
+            )
+
+    # 2) Enrichers (audio / video-meta / edition / language).
+    result = _annotate_enrichers(result, kb)
+
+    # 3) Single pass: tag each UNKNOWN token by looking it up in the kb
+    #    buckets. First match wins per token, first occurrence wins per
+    #    role (we don't overwrite an already-tagged role).
+    matchers: list[tuple[TokenRole, callable]] = [
+        (TokenRole.SEASON_EPISODE, lambda t: _parse_season_episode(t) is not None),
+        (TokenRole.YEAR, _is_year),
+        (TokenRole.RESOLUTION, lambda t: t.lower() in kb.resolutions),
+        (TokenRole.DISTRIBUTOR, lambda t: t.upper() in kb.distributors),
+        (TokenRole.SOURCE, lambda t: t.lower() in kb.sources),
+        (TokenRole.CODEC, lambda t: t.lower() in kb.codecs),
+    ]
+    seen: set[TokenRole] = set()
+
+    for i, tok in enumerate(result):
+        if tok.role is not TokenRole.UNKNOWN:
+            continue
+        for role, matches in matchers:
+            if role in seen:
+                continue
+            if matches(tok.text):
+                result[i] = tok.with_role(role)
+                seen.add(role)
+                break
+
+    # 4) Title = leftmost contiguous UNKNOWN tokens.
+    for i, tok in enumerate(result):
+        if tok.role is not TokenRole.UNKNOWN:
+            break
+        result[i] = tok.with_role(TokenRole.TITLE)
+
+    return result
+
+
+# ---------------------------------------------------------------------------
+# Stage 2c — enricher pass (non-positional roles)
+# ---------------------------------------------------------------------------
+
+
+def _annotate_enrichers(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
+    """Tag the remaining UNKNOWN tokens with non-positional roles.
+
+    Multi-token sequences are matched first (so ``DTS.HD.MA`` wins over
+    a single-token ``DTS``). For each sequence match, the first token
+    receives the role + ``extra["sequence"]`` (the canonical joined
+    value), and the trailing members are marked with the same role +
+    ``extra["sequence_member"]=True`` so :func:`assemble` extracts the
+    value only from the primary.
+    """
+    result = list(tokens)
+
+    # Multi-token sequences first.
+    _apply_sequences(
+        result, kb.audio.get("sequences", []), "codec", TokenRole.AUDIO_CODEC
+    )
+    _apply_sequences(
+        result, kb.video_meta.get("sequences", []), "hdr", TokenRole.HDR
+    )
+    _apply_sequences(
+        result, kb.editions.get("sequences", []), "edition", TokenRole.EDITION
+    )
+
+    # Single tokens.
+    known_audio_codecs = {c.upper() for c in kb.audio.get("codecs", [])}
+    known_audio_channels = set(kb.audio.get("channels", []))
+    known_hdr = {h.upper() for h in kb.video_meta.get("hdr", [])} | kb.hdr_extra
+    known_bit_depth = {d.lower() for d in kb.video_meta.get("bit_depth", [])}
+    known_editions = {t.upper() for t in kb.editions.get("tokens", [])}
+
+    # Channel layouts like "5.1" are tokenized as two tokens ("5", "1")
+    # because "." is a separator. Detect consecutive pairs whose joined
+    # value (without any trailing "-GROUP") is in the channel set.
+    _detect_channel_pairs(result, known_audio_channels)
+
+    for i, tok in enumerate(result):
+        if tok.role is not TokenRole.UNKNOWN:
+            continue
+        text = tok.text
+        upper = text.upper()
+        lower = text.lower()
+
+        if upper in known_audio_codecs:
+            result[i] = tok.with_role(TokenRole.AUDIO_CODEC)
+            continue
+        if text in known_audio_channels:
+            result[i] = tok.with_role(TokenRole.AUDIO_CHANNELS)
+            continue
+        if upper in known_hdr:
+            result[i] = tok.with_role(TokenRole.HDR)
+            continue
+        if lower in known_bit_depth:
+            result[i] = tok.with_role(TokenRole.BIT_DEPTH)
+            continue
+        if upper in known_editions:
+            result[i] = tok.with_role(TokenRole.EDITION)
+            continue
+        if upper in kb.language_tokens:
+            result[i] = tok.with_role(TokenRole.LANGUAGE)
+            continue
+        if upper in kb.distributors:
+            result[i] = tok.with_role(TokenRole.DISTRIBUTOR)
+            continue
+
+    return result
+
+
+def _apply_sequences(
+    tokens: list[Token],
+    sequences: list[dict],
+    value_key: str,
+    role: TokenRole,
+) -> None:
+    """Mark the first occurrence of each sequence in place.
+
+    Mutates ``tokens`` (replacing entries with new role-tagged Token
+    instances). Sequences in the YAML must be ordered most-specific
+    first; the first match wins per starting position.
+    """
+    if not sequences:
+        return
+
+    upper_texts = [t.text.upper() for t in tokens]
+    consumed: set[int] = set()
+
+    for seq in sequences:
+        seq_upper = [s.upper() for s in seq["tokens"]]
+        n = len(seq_upper)
+        for start in range(len(tokens) - n + 1):
+            if any(idx in consumed for idx in range(start, start + n)):
+                continue
+            if any(
+                tokens[start + k].role is not TokenRole.UNKNOWN for k in range(n)
+            ):
+                continue
+            if upper_texts[start : start + n] == seq_upper:
+                tokens[start] = tokens[start].with_role(
+                    role, sequence=seq[value_key]
+                )
+                for k in range(1, n):
+                    tokens[start + k] = tokens[start + k].with_role(
+                        role, sequence_member="True"
+                    )
+                consumed.update(range(start, start + n))
+
+
+def _detect_channel_pairs(
+    tokens: list[Token], known_channels: set[str]
+) -> None:
+    """Spot two consecutive numeric tokens that form a channel layout.
+
+    Example: ``["5", "1-KTH"]`` → joined ``"5.1"`` (after stripping the
+    ``-GROUP`` suffix on the second). The second token may be the trailing
+    codec-GROUP token, in which case it's already tagged CODEC and we
+    skip — we'd corrupt its role.
+    """
+    for i in range(len(tokens) - 1):
+        first = tokens[i]
+        second = tokens[i + 1]
+        if first.role is not TokenRole.UNKNOWN:
+            continue
+        # Strip a "-GROUP" suffix on the second token before joining.
+        second_text = second.text.split("-")[0]
+        candidate = f"{first.text}.{second_text}"
+        if candidate not in known_channels:
+            continue
+        # Only tag the first token (carries the channel value). The
+        # second token may legitimately remain UNKNOWN (or be the
+        # codec-GROUP token, already tagged CODEC).
+        tokens[i] = first.with_role(
+            TokenRole.AUDIO_CHANNELS, sequence=candidate
+        )
+        if second.role is TokenRole.UNKNOWN:
+            tokens[i + 1] = second.with_role(
+                TokenRole.AUDIO_CHANNELS, sequence_member="True"
+            )
+
+
+# ---------------------------------------------------------------------------
+# Stage 2 entry point
+# ---------------------------------------------------------------------------
+
+
+def annotate(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
+    """Annotate token roles.
+
+    Dispatch:
+
+    * If a group is detected AND has a known schema, run the EASY
+      structural walk. If the schema walk aborts on a mandatory chunk
+      mismatch, fall through to SHITTY (the heuristic still does better
+      than giving up).
+    * Otherwise run SHITTY — schema-less, best-effort, never aborts.
+
+    The enricher pass runs in both cases. The pipeline always returns a
+    populated token list; downstream callers don't need to distinguish
+    EASY vs SHITTY at this layer (the parse_path is decided in the
+    service based on whether a schema matched).
+    """
+    group_name, group_index = _detect_group(tokens, kb)
+
+    schema = kb.group_schema(group_name) if group_index is not None else None
+    if schema is not None and group_index is not None:
+        structural = _annotate_structural(tokens, kb, schema, group_index)
+        if structural is not None:
+            return _annotate_enrichers(structural, kb)
+
+    # SHITTY fallback — heuristic positional pass. ``_annotate_shitty``
+    # runs its own enricher pass internally (it has to, so the title
+    # scan can skip enricher-tagged tokens).
+    return _annotate_shitty(tokens, kb, group_index)
+
+
+def has_known_schema(tokens: list[Token], kb: ReleaseKnowledge) -> bool:
+    """Return True if ``tokens`` would take the EASY path in :func:`annotate`."""
+    group_name, group_index = _detect_group(tokens, kb)
+    if group_index is None:
+        return False
+    return kb.group_schema(group_name) is not None
+
+
+# ---------------------------------------------------------------------------
+# Stage 3 — assemble
+# ---------------------------------------------------------------------------
+
+
+def assemble(
+    annotated: list[Token],
+    site_tag: str | None,
+    raw_name: str,
+    kb: ReleaseKnowledge,
+) -> dict:
+    """Fold annotated tokens into a ``ParsedRelease``-compatible dict.
+
+    Returns a dict (not a ``ParsedRelease`` instance) so the caller can
+    layer in additional fields (``parse_path``, ``raw``, …) before
+    instantiation.
+    """
+    # Pure-punctuation tokens (e.g. a stray "-" left by ` - ` separators in
+    # human-friendly release names) carry no title content and would leak
+    # into the joined title as ``"Show.-.Episode"``. Drop them here.
+    title_parts = [
+        t.text
+        for t in annotated
+        if t.role is TokenRole.TITLE and any(c.isalnum() for c in t.text)
+    ]
+    title = ".".join(title_parts) if title_parts else (
+        annotated[0].text if annotated else raw_name
+    )
+
+    year: int | None = None
+    season: int | None = None
+    episode: int | None = None
+    episode_end: int | None = None
+    quality: str | None = None
+    source: str | None = None
+    codec: str | None = None
+    group = "UNKNOWN"
+    audio_codec: str | None = None
+    audio_channels: str | None = None
+    bit_depth: str | None = None
+    hdr_format: str | None = None
+    edition: str | None = None
+    distributor: str | None = None
+    languages: list[str] = []
+    is_season_range = False
+
+    for tok in annotated:
+        # Skip non-primary members of a multi-token sequence.
+        if tok.extra.get("sequence_member") == "True":
+            continue
+
+        role = tok.role
+        if role is TokenRole.YEAR:
+            year = int(tok.text)
+        elif role is TokenRole.SEASON_EPISODE:
+            parsed = _parse_season_episode(tok.text)
+            if parsed is not None:
+                season, episode, episode_end = parsed
+                # Detect Sxx-yy range form to flag it as a multi-season pack.
+                upper = tok.text.upper()
+                if (
+                    len(upper) == 6
+                    and upper[0] == "S"
+                    and upper[1:3].isdigit()
+                    and upper[3] == "-"
+                    and upper[4:6].isdigit()
+                ):
+                    is_season_range = True
+        elif role is TokenRole.RESOLUTION:
+            quality = tok.text
+        elif role is TokenRole.SOURCE:
+            source = tok.text
+        elif role is TokenRole.CODEC:
+            codec = tok.extra.get("codec", tok.text)
+            if "group" in tok.extra:
+                group = tok.extra["group"] or "UNKNOWN"
+        elif role is TokenRole.GROUP:
+            group = tok.extra.get("group", tok.text) or "UNKNOWN"
+        elif role is TokenRole.AUDIO_CODEC:
+            if audio_codec is None:
+                audio_codec = tok.extra.get("sequence", tok.text)
+        elif role is TokenRole.AUDIO_CHANNELS:
+            if audio_channels is None:
+                audio_channels = tok.extra.get("sequence", tok.text)
+        elif role is TokenRole.BIT_DEPTH:
+            if bit_depth is None:
+                bit_depth = tok.text.lower()
+        elif role is TokenRole.HDR:
+            if hdr_format is None:
+                hdr_format = tok.extra.get("sequence", tok.text.upper())
+        elif role is TokenRole.EDITION:
+            if edition is None:
+                edition = tok.extra.get("sequence", tok.text.upper())
+        elif role is TokenRole.LANGUAGE:
+            languages.append(tok.text.upper())
+        elif role is TokenRole.DISTRIBUTOR:
+            if distributor is None:
+                distributor = tok.text.upper()
+
+    # Media type heuristic. Doc/concert/integrale tokens win over the
+    # generic tech-based fallback. We look across all tokens (not just
+    # annotated ones) because these markers may be tagged UNKNOWN by the
+    # structural pass — only the assemble step cares about them.
+    upper_tokens = {tok.text.upper() for tok in annotated}
+    doc_tokens = {t.upper() for t in kb.media_type_tokens.get("doc", [])}
+    concert_tokens = {t.upper() for t in kb.media_type_tokens.get("concert", [])}
+    integrale_tokens = {t.upper() for t in kb.media_type_tokens.get("integrale", [])}
+
+    if upper_tokens & doc_tokens:
+        media_type = MediaTypeToken.DOCUMENTARY
+    elif upper_tokens & concert_tokens:
+        media_type = MediaTypeToken.CONCERT
+    elif is_season_range:
+        media_type = MediaTypeToken.TV_COMPLETE
+    elif (
+        edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
+        or upper_tokens & integrale_tokens
+    ) and season is None:
+        media_type = MediaTypeToken.TV_COMPLETE
+    elif season is not None:
+        media_type = MediaTypeToken.TV_SHOW
+    elif any((quality, source, codec, year)):
+        media_type = MediaTypeToken.MOVIE
+    else:
+        media_type = MediaTypeToken.UNKNOWN
+
+    return {
+        "title": title,
+        "title_sanitized": kb.sanitize_for_fs(title),
+        "year": year,
+        "season": season,
+        "episode": episode,
+        "episode_end": episode_end,
+        "quality": quality,
+        "source": source,
+        "codec": codec,
+        "group": group,
+        "media_type": media_type,
+        "site_tag": site_tag,
+        "languages": tuple(languages),
+        "audio_codec": audio_codec,
+        "audio_channels": audio_channels,
+        "bit_depth": bit_depth,
+        "hdr_format": hdr_format,
+        "edition": edition,
+        "distributor": distributor,
+    }
@@ -0,0 +1,47 @@
+"""Group schema value objects.
+
+A :class:`GroupSchema` describes the canonical chunk layout of releases
+from a known group (KONTRAST, RARBG, ELiTE, …). It is the EASY-road
+contract: when a release ends in ``-<GROUP>`` and we know the group,
+the annotator walks the schema instead of running the heuristic SHITTY
+matchers.
+
+Schemas are loaded from ``knowledge/release/release_groups/<group>.yaml``
+by an infrastructure adapter and surfaced via the
+:class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge` port.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+from .tokens import TokenRole
+
+
+@dataclass(frozen=True)
+class SchemaChunk:
+    """One entry in a group's chunk order.
+
+    ``role`` is the :class:`TokenRole` the chunk maps to. ``optional``
+    is True for chunks that may be absent (e.g. ``year`` on TV releases,
+    ``source`` on bare ELiTE TV releases).
+    """
+
+    role: TokenRole
+    optional: bool = False
+
+
+@dataclass(frozen=True)
+class GroupSchema:
+    """Schema for a known release group.
+
+    ``chunks`` is the left-to-right canonical order. The annotator walks
+    tokens and chunks in lockstep: an optional chunk that doesn't match
+    the current token is skipped (the chunk index advances, the token
+    index stays), a mandatory chunk that doesn't match aborts the EASY
+    path and falls back to SHITTY.
+    """
+
+    name: str
+    separator: str
+    chunks: tuple[SchemaChunk, ...]
@@ -0,0 +1,139 @@
+"""Parse-confidence scoring.
+
+``parse_release`` returns a :class:`ParseReport` alongside its
+:class:`ParsedRelease`. The report carries:
+
+- ``confidence``: integer 0–100 derived from which structural and
+  technical fields got populated, minus a penalty per UNKNOWN token
+  left in the annotated stream.
+- ``road``: which of the three roads the parse took
+  (:class:`Road.EASY` / :class:`Road.SHITTY` / :class:`Road.PATH_OF_PAIN`).
+- ``unknown_tokens``: textual residue, useful for diagnostics.
+- ``missing_critical``: structural fields the score-tally found absent
+  (e.g. ``("year", "media_type")``) — the caller can use this to drive
+  PoP recovery (questions, LLM call).
+
+All weights, penalties and thresholds come from the injected knowledge
+base (``kb.scoring``), itself loaded from
+``alfred/knowledge/release/scoring.yaml``. No magic numbers here.
+
+The scoring functions are pure — they consume the annotated token list
+and the resulting :class:`ParsedRelease` and return the report. They are
+called by ``services.parse_release`` after ``assemble`` has run.
+"""
+
+from __future__ import annotations
+
+from enum import Enum
+
+from ..ports.knowledge import ReleaseKnowledge
+from ..value_objects import ParsedRelease
+from .tokens import Token, TokenRole
+
+
+class Road(str, Enum):
+    """How the parser handled a given release name.
+
+    Distinct from :class:`~alfred.domain.release.value_objects.TokenizationRoute`,
+    which records the tokenization route (DIRECT / SANITIZED / AI). Road
+    is about confidence in the *result*, not the *method*.
+    """
+
+    EASY = "easy"  # group schema matched — structural annotation
+    SHITTY = "shitty"  # no schema, dict-driven annotation, score ≥ threshold
+    PATH_OF_PAIN = "path_of_pain"  # score below threshold, needs help
+
+
+# Critical structural fields — their absence drives the
+# ``missing_critical`` list in the report.
+_CRITICAL_FIELDS: tuple[str, ...] = ("title", "media_type", "year")
+
+
+def _is_tv_shaped(parsed: ParsedRelease) -> bool:
+    """Season/episode weights only count for releases that *look* like TV."""
+    return parsed.season is not None
+
+
+def compute_score(
+    parsed: ParsedRelease,
+    annotated: list[Token],
+    kb: ReleaseKnowledge,
+) -> int:
+    """Compute a 0–100 confidence score for the parse.
+
+    Each populated field contributes its weight from
+    ``kb.scoring["weights"]``. Season/episode only count when the parse
+    looks like TV. ``group == "UNKNOWN"`` is treated as absent.
+
+    Then a penalty is subtracted per residual UNKNOWN token in
+    ``annotated``, capped at ``penalties["max_unknown_penalty"]``.
+
+    Result is clamped to ``[0, 100]``.
+    """
+    weights = kb.scoring["weights"]
+    penalties = kb.scoring["penalties"]
+
+    score = 0
+    if parsed.title:
+        score += weights.get("title", 0)
+    if parsed.media_type and parsed.media_type.value != "unknown":
+        score += weights.get("media_type", 0)
+    if parsed.year is not None:
+        score += weights.get("year", 0)
+    if _is_tv_shaped(parsed):
+        if parsed.season is not None:
+            score += weights.get("season", 0)
+        if parsed.episode is not None:
+            score += weights.get("episode", 0)
+    if parsed.quality:
+        score += weights.get("resolution", 0)
+    if parsed.source:
+        score += weights.get("source", 0)
+    if parsed.codec:
+        score += weights.get("codec", 0)
+    if parsed.group and parsed.group != "UNKNOWN":
+        score += weights.get("group", 0)
+
+    unknown_count = sum(1 for t in annotated if t.role is TokenRole.UNKNOWN)
+    raw_penalty = unknown_count * penalties.get("unknown_token", 0)
+    capped_penalty = min(raw_penalty, penalties.get("max_unknown_penalty", 0))
+    score -= capped_penalty
+
+    return max(0, min(100, score))
+
+
+def collect_unknown_tokens(annotated: list[Token]) -> tuple[str, ...]:
+    """Return the text of every token still tagged UNKNOWN."""
+    return tuple(t.text for t in annotated if t.role is TokenRole.UNKNOWN)
+
+
+def collect_missing_critical(parsed: ParsedRelease) -> tuple[str, ...]:
+    """Return the names of critical structural fields that are absent."""
+    missing: list[str] = []
+    if not parsed.title:
+        missing.append("title")
+    if not parsed.media_type or parsed.media_type.value == "unknown":
+        missing.append("media_type")
+    if parsed.year is None:
+        missing.append("year")
+    return tuple(missing)
+
+
+def decide_road(
+    score: int,
+    has_schema: bool,
+    kb: ReleaseKnowledge,
+) -> Road:
+    """Pick the road the parse took.
+
+    EASY is decided structurally: if a known group schema matched, the
+    annotation walked the schema, and that's enough — the score does not
+    veto EASY. Otherwise the score decides between SHITTY and
+    PATH_OF_PAIN using ``kb.scoring["thresholds"]["shitty_min"]``.
+    """
+    if has_schema:
+        return Road.EASY
+    threshold = kb.scoring["thresholds"].get("shitty_min", 60)
+    if score >= threshold:
+        return Road.SHITTY
+    return Road.PATH_OF_PAIN
@@ -0,0 +1,90 @@
+"""Token value objects for the annotate-based parser.
+
+A :class:`Token` carries both the original substring and its position in
+the original release name's token stream. A :class:`TokenRole` is the
+semantic tag assigned by the annotator.
+
+Why VOs instead of bare ``str``: the annotate step needs to flag tokens
+without consuming them (a token may carry residual info — e.g. a
+``codec-GROUP`` token contributes both a CODEC and a GROUP role). Tracking
+the index also lets later stages reason about *order* (year must come
+after title, group must be rightmost, etc.) without re-scanning the list.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from enum import Enum
+
+
+class TokenRole(str, Enum):
+    """Semantic role a token can take after annotation.
+
+    A token starts as ``UNKNOWN`` and may be promoted by the annotator.
+    ``str``-backed for cheap comparisons and YAML/JSON interop.
+
+    Roles split into three families:
+
+    - **structural**: TITLE / YEAR / SEASON_EPISODE / GROUP — drive folder
+      and filename naming.
+    - **technical**: RESOLUTION / SOURCE / CODEC / AUDIO_CODEC /
+      AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION / LANGUAGE — feed
+      ``tech_string`` and metadata fields.
+    - **meta**: SITE_TAG (stripped pre-tokenize), SEPARATOR (kept for the
+      assemble step if a release uses spaces that need preservation in the
+      title), UNKNOWN (residual, contributes to the SHITTY score penalty).
+    """
+
+    UNKNOWN = "unknown"
+
+    # Structural
+    TITLE = "title"
+    YEAR = "year"
+    SEASON_EPISODE = "season_episode"
+    GROUP = "group"
+
+    # Technical
+    RESOLUTION = "resolution"
+    SOURCE = "source"
+    CODEC = "codec"
+    AUDIO_CODEC = "audio_codec"
+    AUDIO_CHANNELS = "audio_channels"
+    BIT_DEPTH = "bit_depth"
+    HDR = "hdr"
+    EDITION = "edition"
+    LANGUAGE = "language"
+    DISTRIBUTOR = "distributor"
+
+    # Meta
+    SITE_TAG = "site_tag"
+
+
+@dataclass(frozen=True)
+class Token:
+    """An atomic token from a release name.
+
+    ``text`` is the substring exactly as it appeared after tokenization
+    (case preserved — uppercase comparisons happen at match time).
+    ``index`` is the 0-based position in the tokenized stream, used by
+    downstream stages to enforce ordering invariants.
+
+    ``role`` defaults to :attr:`TokenRole.UNKNOWN`. The annotator returns
+    new :class:`Token` instances with the role set rather than mutating
+    (the dataclass is frozen). ``extra`` carries role-specific payload
+    when the token text alone isn't enough (e.g. a ``codec-GROUP`` token
+    annotated as CODEC may record the group name in ``extra["group"]``).
+    """
+
+    text: str
+    index: int
+    role: TokenRole = TokenRole.UNKNOWN
+    extra: dict[str, str] = field(default_factory=dict)
+
+    def with_role(self, role: TokenRole, **extra: str) -> Token:
+        """Return a copy of this token with ``role`` (and optional ``extra``)."""
+        merged = {**self.extra, **extra} if extra else self.extra
+        return Token(text=self.text, index=self.index, role=role, extra=merged)
+
+    @property
+    def is_annotated(self) -> bool:
+        return self.role is not TokenRole.UNKNOWN
@@ -0,0 +1,10 @@
+"""Domain ports for the release domain.
+
+Protocol-based abstractions that decouple ``parse_release`` and
+``ParsedRelease`` from any concrete knowledge-base loader. The
+infrastructure layer provides the adapter that satisfies this contract.
+"""
+
+from .knowledge import ReleaseKnowledge
+
+__all__ = ["ReleaseKnowledge"]
@@ -0,0 +1,91 @@
+"""ReleaseKnowledge port — the read-only query surface that
+``parse_release`` and ``ParsedRelease`` need from the release knowledge
+base, expressed as a structural Protocol so the domain never imports any
+concrete loader.
+
+The concrete YAML-backed implementation lives in
+``alfred/infrastructure/knowledge/release_kb.py``. Tests can supply any
+object that satisfies this shape (e.g. a simple dataclass).
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Protocol
+
+if TYPE_CHECKING:
+    from ..parser.schema import GroupSchema
+
+
+class ReleaseKnowledge(Protocol):
+    """Read-only snapshot of release-name parsing knowledge."""
+
+    # --- Token sets used by the tokenizer / matchers ---
+
+    resolutions: set[str]
+    sources: set[str]
+    codecs: set[str]
+    distributors: set[str]
+    language_tokens: set[str]
+    forbidden_chars: set[str]
+    hdr_extra: set[str]
+
+    # --- Structured knowledge (loaded from YAML as dicts) ---
+
+    audio: dict
+    video_meta: dict
+    editions: dict
+    media_type_tokens: dict
+
+    # --- Tokenizer separators ---
+
+    separators: list[str]
+
+    # --- Parse scoring (Phase A) ---
+    #
+    # ``scoring`` is a dict with three keys:
+    #   - ``weights``:     dict[field_name, int]   field weight contribution
+    #   - ``penalties``:   {"unknown_token": int, "max_unknown_penalty": int}
+    #   - ``thresholds``:  {"shitty_min": int}     SHITTY vs PATH_OF_PAIN cutoff
+    #
+    # Concrete values come from ``alfred/knowledge/release/scoring.yaml``.
+    # The loader fills in safe defaults so this dict is always populated.
+
+    scoring: dict
+
+    # --- ffprobe → scene-token translation tables (consumed by
+    #     ``application.release.enrich_from_probe``). Domain parsing itself
+    #     doesn't touch these — exposed on the same KB to keep release
+    #     knowledge in a single ownership point.
+    #
+    #     Shape:
+    #       - ``video_codec``:    dict[str, str]   ffprobe lower → scene token
+    #       - ``audio_codec``:    dict[str, str]   ffprobe lower → scene token
+    #       - ``audio_channels``: dict[int, str]   channel count → layout ---
+
+    probe_mappings: dict
+
+    # --- File-extension sets (used by application/infra modules that work
+    #     directly with filesystem paths, e.g. media-type detection, video
+    #     lookup). Domain parsing itself doesn't touch these. ---
+
+    video_extensions: set[str]
+    non_video_extensions: set[str]
+    subtitle_extensions: set[str]
+    metadata_extensions: set[str]
+
+    # --- Filesystem sanitization (Option B: pre-sanitize at parse time) ---
+
+    def sanitize_for_fs(self, text: str) -> str:
+        """Strip filesystem-forbidden characters from ``text``."""
+        ...
+
+    # --- Release group schemas (EASY path) ---
+
+    def group_schema(self, name: str) -> GroupSchema | None:
+        """Return the parsing schema for the named release group, or
+        ``None`` if the group is unknown (caller falls back to SHITTY).
+
+        Lookup is case-insensitive: ``"KONTRAST"``, ``"kontrast"`` and
+        ``"Kontrast"`` all resolve to the same schema.
+        """
+        ...
@@ -1,58 +1,70 @@
-"""Release domain — parsing service."""
+"""Release domain — parsing service.
+
+Thin orchestrator over the annotate-based pipeline in
+:mod:`alfred.domain.release.parser.pipeline`. Responsibilities:
+
+* Strip a leading/trailing ``[site.tag]`` and decide ``parse_path``.
+* Reject malformed names (forbidden characters) → ``parse_path=AI`` so
+  the LLM can clean them up.
+* Otherwise call the v2 pipeline (tokenize → annotate → assemble) and
+  wrap the result in :class:`ParsedRelease`.
+* Score the result and decide the road (EASY / SHITTY / PATH_OF_PAIN)
+  via :mod:`alfred.domain.release.parser.scoring`.
+
+The public entry point is :func:`parse_release`, which returns
+``(ParsedRelease, ParseReport)``. The report carries the confidence
+score, the road, and diagnostic info for downstream callers.
+"""

 from __future__ import annotations

-import re
-
-from alfred.infrastructure.knowledge.release import load_separators
-from .value_objects import (
-    _AUDIO,
-    _CODECS,
-    _EDITIONS,
-    _FORBIDDEN_CHARS,
-    _HDR_EXTRA,
-    _LANGUAGE_TOKENS,
-    _MEDIA_TYPE_TOKENS,
-    _RESOLUTIONS,
-    _SOURCES,
-    _VIDEO_META,
-    MediaTypeToken,
-    ParsedRelease,
-    ParsePath,
-)
+from .parser import pipeline as _v2
+from .parser import scoring as _scoring
+from .ports import ReleaseKnowledge
+from .value_objects import MediaTypeToken, ParsedRelease, ParseReport, TokenizationRoute


-def _tokenize(name: str) -> list[str]:
-    """Split a release name on the configured separators, dropping empty tokens."""
-    pattern = "[" + re.escape("".join(load_separators())) + "]+"
-    return [t for t in re.split(pattern, name) if t]
+def parse_release(
+    name: str, kb: ReleaseKnowledge
+) -> tuple[ParsedRelease, ParseReport]:
+    """Parse a release name.

-
-def parse_release(name: str) -> ParsedRelease:
-    """
-    Parse a release name and return a ParsedRelease.
+    Returns a tuple ``(ParsedRelease, ParseReport)``. The structural VO
+    is unchanged from the previous single-return contract; the report
+    is new and carries the confidence score + road decision.

    Flow:
-      1. Strip a leading/trailing [site.tag] if present (sets parse_path="sanitized").
-      2. Check the remainder for truly forbidden chars (anything not in the
-         configured separators list). If any remain → media_type="unknown",
-         parse_path="ai", and the LLM handles it.
-      3. Tokenize using the configured separators (".", " ", "[", "]", "(", ")", "_", ...)
-         and run token-level matchers (season/episode, tech, languages, audio,
-         video, edition, title, year).
+
+    1. Strip a leading/trailing ``[site.tag]`` if present (sets
+       ``parse_path="sanitized"``).
+    2. If the remainder still contains truly forbidden chars (anything
+       not in the configured separators), short-circuit to
+       ``media_type="unknown"`` / ``parse_path="ai"`` and emit a
+       PATH_OF_PAIN report — the LLM handles these.
+    3. Otherwise run the v2 pipeline: tokenize → annotate (EASY when a
+       group schema is known, SHITTY otherwise) → assemble → score.
    """
-    parse_path = ParsePath.DIRECT.value
+    parse_path = TokenizationRoute.DIRECT

-    # Always try to extract a bracket-enclosed site tag first.
-    clean, site_tag = _strip_site_tag(name)
+    # Apostrophes inside titles ("Don't", "L'avare") are common and should
+    # not push the release through the AI fallback. Strip them up front so
+    # both strip_site_tag and tokenize see "Dont" / "Lavare", which is good
+    # enough for token-level matching. The raw name is preserved on the VO.
+    working_name = name
+    if "'" in working_name:
+        working_name = working_name.replace("'", "")
+        parse_path = TokenizationRoute.SANITIZED
+
+    clean, site_tag = _v2.strip_site_tag(working_name)
    if site_tag is not None:
-        parse_path = ParsePath.SANITIZED.value
+        parse_path = TokenizationRoute.SANITIZED

-    if not _is_well_formed(clean):
-        return ParsedRelease(
+    if not _is_well_formed(clean, kb):
+        parsed = ParsedRelease(
            raw=name,
-            normalised=clean,
+            clean=clean,
            title=clean,
+            title_sanitized=kb.sanitize_for_fs(clean),
            year=None,
            season=None,
            episode=None,
@@ -61,448 +73,49 @@ def parse_release(name: str) -> ParsedRelease:
            source=None,
            codec=None,
            group="UNKNOWN",
-            tech_string="",
-            media_type=MediaTypeToken.UNKNOWN.value,
+            media_type=MediaTypeToken.UNKNOWN,
            site_tag=site_tag,
-            parse_path=ParsePath.AI.value,
+            parse_path=TokenizationRoute.AI,
        )
-
-    name = clean
-    tokens = _tokenize(name)
-
-    season, episode, episode_end = _extract_season_episode(tokens)
-    quality, source, codec, group, tech_tokens = _extract_tech(tokens)
-    languages, lang_tokens = _extract_languages(tokens)
-    audio_codec, audio_channels, audio_tokens = _extract_audio(tokens)
-    bit_depth, hdr_format, video_tokens = _extract_video_meta(tokens)
-    edition, edition_tokens = _extract_edition(tokens)
-    title = _extract_title(
-        tokens,
-        tech_tokens | lang_tokens | audio_tokens | video_tokens | edition_tokens,
-    )
-    year = _extract_year(tokens, title)
-    media_type = _infer_media_type(
-        season, quality, source, codec, year, edition, tokens
+        report = ParseReport(
+            confidence=0,
+            road=_scoring.Road.PATH_OF_PAIN.value,
+            unknown_tokens=(clean,),
+            missing_critical=("title", "media_type", "year"),
        )
+        return parsed, report

-    tech_parts = [p for p in [quality, source, codec] if p]
-    tech_string = ".".join(tech_parts)
+    tokens, v2_tag = _v2.tokenize(working_name, kb)
+    annotated = _v2.annotate(tokens, kb)
+    fields = _v2.assemble(annotated, v2_tag, name, kb)

-    return ParsedRelease(
+    parsed = ParsedRelease(
        raw=name,
-        normalised=name,
-        title=title,
-        year=year,
-        season=season,
-        episode=episode,
-        episode_end=episode_end,
-        quality=quality,
-        source=source,
-        codec=codec,
-        group=group,
-        tech_string=tech_string,
-        media_type=media_type,
-        site_tag=site_tag,
+        clean=clean,
        parse_path=parse_path,
-        languages=languages,
-        audio_codec=audio_codec,
-        audio_channels=audio_channels,
-        bit_depth=bit_depth,
-        hdr_format=hdr_format,
-        edition=edition,
+        **fields,
    )

-
-def _infer_media_type(
-    season: int | None,
-    quality: str | None,
-    source: str | None,
-    codec: str | None,
-    year: int | None,
-    edition: str | None,
-    tokens: list[str],
-) -> str:
-    """
-    Infer media_type from token-level evidence only (no filesystem access).
-
-    - documentary  : DOC token present
-    - concert      : CONCERT token present
-    - tv_complete  : INTEGRALE/COMPLETE token, no season
-    - tv_show      : season token found
-    - movie        : no season, at least one tech marker
-    - unknown      : no conclusive evidence
-    """
-    upper_tokens = {t.upper() for t in tokens}
-
-    doc_tokens = {t.upper() for t in _MEDIA_TYPE_TOKENS.get("doc", [])}
-    concert_tokens = {t.upper() for t in _MEDIA_TYPE_TOKENS.get("concert", [])}
-    integrale_tokens = {t.upper() for t in _MEDIA_TYPE_TOKENS.get("integrale", [])}
-
-    if upper_tokens & doc_tokens:
-        return MediaTypeToken.DOCUMENTARY.value
-    if upper_tokens & concert_tokens:
-        return MediaTypeToken.CONCERT.value
-    if (
-        edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
-        or upper_tokens & integrale_tokens
-    ) and season is None:
-        return MediaTypeToken.TV_COMPLETE.value
-    if season is not None:
-        return MediaTypeToken.TV_SHOW.value
-    if any([quality, source, codec, year]):
-        return MediaTypeToken.MOVIE.value
-    return MediaTypeToken.UNKNOWN.value
-
-
-def _is_well_formed(name: str) -> bool:
-    """Return True if name contains no forbidden characters per scene naming rules.
-
-    Characters listed as token separators (spaces, brackets, parens, …) are NOT
-    considered malforming — the tokenizer handles them. Only truly broken chars
-    like '@', '#', '!', '%' make a name malformed.
-    """
-    tokenizable = set(load_separators())
-    return not any(c in name for c in _FORBIDDEN_CHARS if c not in tokenizable)
-
-
-def _strip_site_tag(name: str) -> tuple[str, str | None]:
-    """
-    Strip a site watermark tag from the release name and return (clean_name, tag).
-
-    Handles two positions:
-    - Prefix:  "[ OxTorrent.vc ] The.Title.S01..."
-    - Suffix:  "The.Title.S01...-NTb[TGx]"
-
-    Anything between [...] is treated as a site tag.
-    Returns (original_name, None) if no tag found.
-    """
-    s = name.strip()
-
-    if s.startswith("["):
-        close = s.find("]")
-        if close != -1:
-            tag = s[1:close].strip()
-            remainder = s[close + 1 :].strip()
-            if tag and remainder:
-                return remainder, tag
-
-    if s.endswith("]"):
-        open_bracket = s.rfind("[")
-        if open_bracket != -1:
-            tag = s[open_bracket + 1 : -1].strip()
-            remainder = s[:open_bracket].strip()
-            if tag and remainder:
-                return remainder, tag
-
-    return s, None
-
-
-def _parse_season_episode(tok: str) -> tuple[int, int | None, int | None] | None:
-    """
-    Parse a single token as a season/episode marker.
-
-    Handles:
-      - SxxExx / SxxExxExx / Sxx        (canonical scene form)
-      - NxNN / NxNNxNN                  (alt form: 1x05, 12x07x08)
-
-    Returns (season, episode, episode_end) or None if not a season token.
-    """
-    upper = tok.upper()
-
-    # SxxExx form
-    if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
-        season = int(upper[1:3])
-        rest = upper[3:]
-
-        if not rest:
-            return season, None, None
-
-        episodes: list[int] = []
-        while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
-            episodes.append(int(rest[1:3]))
-            rest = rest[3:]
-
-        if not episodes:
-            return None  # malformed token like "S03XYZ"
-
-        return season, episodes[0], episodes[1] if len(episodes) >= 2 else None
-
-    # NxNN form — split on "X" (uppercased), all parts must be digits
-    if "X" in upper:
-        parts = upper.split("X")
-        if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
-            season = int(parts[0])
-            episode = int(parts[1])
-            episode_end = int(parts[2]) if len(parts) >= 3 else None
-            return season, episode, episode_end
-
-    return None
-
-
-def _extract_season_episode(
-    tokens: list[str],
-) -> tuple[int | None, int | None, int | None]:
-    for tok in tokens:
-        parsed = _parse_season_episode(tok)
-        if parsed is not None:
-            return parsed
-    return None, None, None
-
-
-def _extract_tech(
-    tokens: list[str],
-) -> tuple[str | None, str | None, str | None, str, set[str]]:
-    """
-    Extract quality, source, codec, group from tokens.
-
-    Returns (quality, source, codec, group, tech_token_set).
-
-    Group extraction strategy (in priority order):
-    1. Token where prefix is a known codec: x265-GROUP
-    2. Rightmost token with a dash that isn't a known source
-    """
-    quality: str | None = None
-    source: str | None = None
-    codec: str | None = None
-    group = "UNKNOWN"
-    tech_tokens: set[str] = set()
-
-    for tok in tokens:
-        tl = tok.lower()
-
-        if tl in _RESOLUTIONS:
-            quality = tok
-            tech_tokens.add(tok)
-            continue
-
-        if tl in _SOURCES:
-            source = tok
-            tech_tokens.add(tok)
-            continue
-
-        if "-" in tok:
-            parts = tok.rsplit("-", 1)
-            # codec-GROUP (highest priority for group)
-            if parts[0].lower() in _CODECS:
-                codec = parts[0]
-                group = parts[1] if parts[1] else "UNKNOWN"
-                tech_tokens.add(tok)
-                continue
-            # source with dash: Web-DL, WEB-DL, etc.
-            if parts[0].lower() in _SOURCES or tok.lower().replace("-", "") in _SOURCES:
-                source = tok
-                tech_tokens.add(tok)
-                continue
-
-        if tl in _CODECS:
-            codec = tok
-            tech_tokens.add(tok)
-
-    # Fallback: rightmost token with a dash that isn't a known source
-    if group == "UNKNOWN":
-        for tok in reversed(tokens):
-            if "-" in tok:
-                parts = tok.rsplit("-", 1)
-                tl = tok.lower()
-                if tl in _SOURCES or tok.lower().replace("-", "") in _SOURCES:
-                    continue
-                if parts[1]:
-                    group = parts[1]
-                    break
-
-    return quality, source, codec, group, tech_tokens
-
-
-def _is_year_token(tok: str) -> bool:
-    """Return True if tok is a 4-digit year between 1900 and 2099."""
-    return len(tok) == 4 and tok.isdigit() and 1900 <= int(tok) <= 2099
-
-
-def _extract_title(tokens: list[str], tech_tokens: set[str]) -> str:
-    """Extract the title portion: everything before the first season/year/tech token."""
-    title_parts = []
-    for tok in tokens:
-        if _parse_season_episode(tok) is not None:
-            break
-        if _is_year_token(tok):
-            break
-        if tok in tech_tokens or tok.lower() in _RESOLUTIONS | _SOURCES | _CODECS:
-            break
-        if "-" in tok and any(p.lower() in _CODECS | _SOURCES for p in tok.split("-")):
-            break
-        title_parts.append(tok)
-
-    return ".".join(title_parts) if title_parts else tokens[0]
-
-
-def _extract_year(tokens: list[str], title: str) -> int | None:
-    """Extract a 4-digit year from tokens (only after the title)."""
-    title_len = len(title.split("."))
-    for tok in tokens[title_len:]:
-        if _is_year_token(tok):
-            return int(tok)
-    return None
-
-
-# ---------------------------------------------------------------------------
-# Sequence matcher
-# ---------------------------------------------------------------------------
-
-
-def _match_sequences(
-    tokens: list[str],
-    sequences: list[dict],
-    key: str,
-) -> tuple[str | None, set[str]]:
-    """
-    Try to match multi-token sequences against consecutive tokens.
-
-    Returns (matched_value, set_of_matched_tokens) or (None, empty_set).
-    Sequences must be ordered most-specific first in the YAML.
-    """
-    upper_tokens = [t.upper() for t in tokens]
-    for seq in sequences:
-        seq_upper = [s.upper() for s in seq["tokens"]]
-        n = len(seq_upper)
-        for i in range(len(upper_tokens) - n + 1):
-            if upper_tokens[i : i + n] == seq_upper:
-                matched = set(tokens[i : i + n])
-                return seq[key], matched
-    return None, set()
-
-
-# ---------------------------------------------------------------------------
-# Language extraction
-# ---------------------------------------------------------------------------
-
-
-def _extract_languages(tokens: list[str]) -> tuple[list[str], set[str]]:
-    """Extract language tokens. Returns (languages, matched_token_set)."""
-    languages = []
-    lang_tokens: set[str] = set()
-    for tok in tokens:
-        if tok.upper() in _LANGUAGE_TOKENS:
-            languages.append(tok.upper())
-            lang_tokens.add(tok)
-    return languages, lang_tokens
-
-
-# ---------------------------------------------------------------------------
-# Audio extraction
-# ---------------------------------------------------------------------------
-
-
-def _extract_audio(
-    tokens: list[str],
-) -> tuple[str | None, str | None, set[str]]:
-    """
-    Extract audio codec and channel layout.
-
-    Returns (audio_codec, audio_channels, matched_token_set).
-    Sequences are tried first (DTS.HD.MA, TrueHD.Atmos, …), then single tokens.
-    """
-    audio_codec: str | None = None
-    audio_channels: str | None = None
-    audio_tokens: set[str] = set()
-
-    known_codecs = {c.upper() for c in _AUDIO.get("codecs", [])}
-    known_channels = set(_AUDIO.get("channels", []))
-
-    # Try multi-token sequences first
-    matched_codec, matched_set = _match_sequences(
-        tokens, _AUDIO.get("sequences", []), "codec"
+    has_schema = _v2.has_known_schema(tokens, kb)
+    score = _scoring.compute_score(parsed, annotated, kb)
+    road = _scoring.decide_road(score, has_schema, kb)
+    report = ParseReport(
+        confidence=score,
+        road=road.value,
+        unknown_tokens=_scoring.collect_unknown_tokens(annotated),
+        missing_critical=_scoring.collect_missing_critical(parsed),
    )
-    if matched_codec:
-        audio_codec = matched_codec
-        audio_tokens |= matched_set
-
-    # Channel layouts like "5.1" or "7.1" are split into two tokens by normalize —
-    # detect them as consecutive pairs "X" + "Y" where "X.Y" is a known channel.
-    # The second token may have a "-GROUP" suffix (e.g. "1-KTH" → strip it).
-    for i in range(len(tokens) - 1):
-        second = tokens[i + 1].split("-")[0]
-        candidate = f"{tokens[i]}.{second}"
-        if candidate in known_channels and audio_channels is None:
-            audio_channels = candidate
-            audio_tokens.add(tokens[i])
-            audio_tokens.add(tokens[i + 1])
-
-    for tok in tokens:
-        if tok in audio_tokens:
-            continue
-        if tok.upper() in known_codecs and audio_codec is None:
-            audio_codec = tok
-            audio_tokens.add(tok)
-        elif tok in known_channels and audio_channels is None:
-            audio_channels = tok
-            audio_tokens.add(tok)
-
-    return audio_codec, audio_channels, audio_tokens
+    return parsed, report


-# ---------------------------------------------------------------------------
-# Video metadata extraction (bit depth, HDR)
-# ---------------------------------------------------------------------------
+def _is_well_formed(name: str, kb: ReleaseKnowledge) -> bool:
+    """Return True if ``name`` contains no forbidden characters per scene
+    naming rules.

-
-def _extract_video_meta(
-    tokens: list[str],
-) -> tuple[str | None, str | None, set[str]]:
+    Characters listed as token separators (spaces, brackets, parens, …)
+    are NOT considered malforming — the tokenizer handles them. Only
+    truly broken chars like ``@``, ``#``, ``!``, ``%`` make a name
+    malformed.
    """
-    Extract bit depth and HDR format.
-
-    Returns (bit_depth, hdr_format, matched_token_set).
-    """
-    bit_depth: str | None = None
-    hdr_format: str | None = None
-    video_tokens: set[str] = set()
-
-    known_hdr = {h.upper() for h in _VIDEO_META.get("hdr", [])} | _HDR_EXTRA
-    known_depth = {d.lower() for d in _VIDEO_META.get("bit_depth", [])}
-
-    # Try HDR sequences first
-    matched_hdr, matched_set = _match_sequences(
-        tokens, _VIDEO_META.get("sequences", []), "hdr"
-    )
-    if matched_hdr:
-        hdr_format = matched_hdr
-        video_tokens |= matched_set
-
-    for tok in tokens:
-        if tok in video_tokens:
-            continue
-        if tok.upper() in known_hdr and hdr_format is None:
-            hdr_format = tok.upper()
-            video_tokens.add(tok)
-        elif tok.lower() in known_depth and bit_depth is None:
-            bit_depth = tok.lower()
-            video_tokens.add(tok)
-
-    return bit_depth, hdr_format, video_tokens
-
-
-# ---------------------------------------------------------------------------
-# Edition extraction
-# ---------------------------------------------------------------------------
-
-
-def _extract_edition(tokens: list[str]) -> tuple[str | None, set[str]]:
-    """
-    Extract release edition (UNRATED, EXTENDED, DIRECTORS.CUT, …).
-
-    Returns (edition, matched_token_set).
-    """
-    known_tokens = {t.upper() for t in _EDITIONS.get("tokens", [])}
-
-    # Try multi-token sequences first
-    matched_edition, matched_set = _match_sequences(
-        tokens, _EDITIONS.get("sequences", []), "edition"
-    )
-    if matched_edition:
-        return matched_edition, matched_set
-
-    for tok in tokens:
-        if tok.upper() in known_tokens:
-            return tok.upper(), {tok}
-
-    return None, set()
+    tokenizable = set(kb.separators)
+    return not any(c in name for c in kb.forbidden_chars if c not in tokenizable)
@@ -1,55 +1,24 @@
-"""Release domain — value objects and token sets."""
+"""Release domain — value objects.
+
+This module is **pure**: no I/O, no YAML loading, no knowledge-base
+imports. All knowledge that the parser consumes is injected at runtime
+via the ``ReleaseKnowledge`` port (see ``ports/knowledge.py``).
+
+``ParsedRelease`` follows Option B of the snapshot-VO design: filesystem
+sanitization is performed once at parse time and stored in
+``title_sanitized``. The builder methods (``show_folder_name``,
+``episode_filename``, etc.) are therefore pure string-formatting and do
+**not** need access to any knowledge base — but they require the caller
+to pass already-sanitized TMDB strings. The use case is responsible for
+calling ``kb.sanitize_for_fs(tmdb_title)`` before invoking the builders.
+"""

 from __future__ import annotations

-from dataclasses import dataclass, field
+from dataclasses import dataclass
 from enum import Enum

 from ..shared.exceptions import ValidationError
-from alfred.infrastructure.knowledge.release import (
-    load_audio,
-    load_codecs,
-    load_editions,
-    load_forbidden_chars,
-    load_hdr_extra,
-    load_language_tokens,
-    load_media_type_tokens,
-    load_metadata_extensions,
-    load_non_video_extensions,
-    load_resolutions,
-    load_sources,
-    load_sources_extra,
-    load_subtitle_extensions,
-    load_video,
-    load_video_extensions,
-    load_win_forbidden_chars,
-)
-
-# Token sets — loaded once at import time from alfred/knowledge/release/
-_RESOLUTIONS: set[str] = load_resolutions()
-_SOURCES: set[str] = load_sources() | load_sources_extra()
-_CODECS: set[str] = load_codecs()
-_VIDEO_EXTENSIONS: set[str] = load_video_extensions()
-_NON_VIDEO_EXTENSIONS: set[str] = load_non_video_extensions()
-_SUBTITLE_EXTENSIONS: set[str] = load_subtitle_extensions()
-# Both metadata and subtitle extensions are ignored when deciding the media
-# type of a folder — neither is a conclusive signal for movie/tv/other.
-_METADATA_EXTENSIONS: set[str] = load_metadata_extensions() | _SUBTITLE_EXTENSIONS
-_FORBIDDEN_CHARS: set[str] = load_forbidden_chars()
-_LANGUAGE_TOKENS: set[str] = load_language_tokens()
-_AUDIO: dict = load_audio()
-_VIDEO_META: dict = load_video()
-_EDITIONS: dict = load_editions()
-_HDR_EXTRA: set[str] = load_hdr_extra()
-_MEDIA_TYPE_TOKENS: dict = load_media_type_tokens()
-
-# Translation table for stripping Windows-forbidden characters
-_WIN_FORBIDDEN_TABLE = str.maketrans("", "", "".join(load_win_forbidden_chars()))
-
-
-def _sanitize_for_fs(text: str) -> str:
-    """Remove Windows-forbidden characters from a string."""
-    return text.translate(_WIN_FORBIDDEN_TABLE)


 class MediaTypeToken(str, Enum):
@@ -71,19 +40,27 @@ class MediaTypeToken(str, Enum):
    UNKNOWN = "unknown"


-class ParsePath(str, Enum):
-    """How a ``ParsedRelease`` was produced. ``str``-backed for the same
-    reasons as :class:`MediaTypeToken`."""
+class TokenizationRoute(str, Enum):
+    """How a ``ParsedRelease`` was produced.
+
+    Records the **tokenization route** — i.e. whether the release name
+    was tokenized as-is (``DIRECT``), after a sanitization pass like
+    site-tag stripping or apostrophe removal (``SANITIZED``), or whether
+    structural parsing failed and an LLM rebuild is needed (``AI``).
+
+    This is **orthogonal** to :class:`~alfred.domain.release.parser.scoring.Road`
+    (EASY / SHITTY / PATH_OF_PAIN), which captures parser confidence and
+    is recorded on :class:`ParseReport`. Both can vary independently —
+    a SANITIZED name can still land on the EASY road if a group schema
+    matches the tokens after stripping.
+
+    ``str``-backed for the same reasons as :class:`MediaTypeToken`."""

    DIRECT = "direct"
    SANITIZED = "sanitized"
    AI = "ai"


-_VALID_MEDIA_TYPES: frozenset[str] = frozenset(m.value for m in MediaTypeToken)
-_VALID_PARSE_PATHS: frozenset[str] = frozenset(p.value for p in ParsePath)
-
-
 def _strip_episode_from_normalized(normalized: str) -> str:
    """
    Remove all episode parts (Exx) from a normalized release name, keeping Sxx.
@@ -103,13 +80,57 @@ def _strip_episode_from_normalized(normalized: str) -> str:
    return ".".join(result)


-@dataclass
+@dataclass(frozen=True)
+class ParseReport:
+    """Diagnostic report attached to a :class:`ParsedRelease`.
+
+    ``parse_release`` returns ``(ParsedRelease, ParseReport)``. The
+    report describes *how confident* the parser is in the result and
+    *which road* produced it. It is intentionally separate from
+    ``ParsedRelease`` so the structural VO stays free of meta-concerns
+    about its own quality.
+
+    Fields:
+
+    - ``confidence``: integer 0–100 (see :func:`parser.scoring.compute_score`).
+    - ``road``: ``"easy"`` / ``"shitty"`` / ``"path_of_pain"`` — distinct
+      from ``ParsedRelease.parse_path`` (which describes the
+      tokenization route, not the confidence tier).
+    - ``unknown_tokens``: tokens that finished annotation with role
+      UNKNOWN, in order of appearance.
+    - ``missing_critical``: names of critical structural fields the
+      parser couldn't fill (subset of ``{"title", "media_type", "year"}``).
+    """
+
+    confidence: int
+    road: str  # one of parser.scoring.Road values
+    unknown_tokens: tuple[str, ...] = ()
+    missing_critical: tuple[str, ...] = ()
+
+    def __post_init__(self) -> None:
+        if not (0 <= self.confidence <= 100):
+            raise ValidationError(
+                f"ParseReport.confidence out of range: {self.confidence}"
+            )
+
+
+@dataclass(frozen=True)
 class ParsedRelease:
-    """Structured representation of a parsed release name."""
+    """Structured representation of a parsed release name.
+
+    ``title_sanitized`` carries the filesystem-safe form of ``title`` (computed
+    by the parser at construction time using the injected knowledge base).
+    Builder methods rely on it being already-sanitized — see module docstring.
+
+    Frozen: enrichment passes (``detect_media_type``, ``enrich_from_probe``)
+    return a **new** ``ParsedRelease`` via ``dataclasses.replace`` rather
+    than mutating in place. ``languages`` is a tuple for the same reason.
+    """

    raw: str  # original release name (untouched)
-    normalised: str  # dots instead of spaces
+    clean: str  # raw minus site_tag and apostrophes — used by season_folder_name()
    title: str  # show/movie title (dots, no year/season/tech)
+    title_sanitized: str  # title with filesystem-forbidden chars stripped
    year: int | None  # movie year or show start year (from TMDB)
    season: int | None  # season number (None for movies)
    episode: int | None  # first episode number (None if season-pack)
@@ -118,18 +139,18 @@ class ParsedRelease:
    source: str | None  # WEBRip, BluRay, …
    codec: str | None  # x265, HEVC, …
    group: str  # release group, "UNKNOWN" if missing
-    tech_string: str  # quality.source.codec joined with dots
    media_type: MediaTypeToken = MediaTypeToken.UNKNOWN
    site_tag: str | None = (
        None  # site watermark stripped from name, e.g. "TGx", "OxTorrent.vc"
    )
-    parse_path: ParsePath = ParsePath.DIRECT
-    languages: list[str] = field(default_factory=list)  # ["MULTI", "VFF"], ["FRENCH"], …
+    parse_path: TokenizationRoute = TokenizationRoute.DIRECT
+    languages: tuple[str, ...] = ()  # ("MULTI", "VFF"), ("FRENCH",), …
    audio_codec: str | None = None  # "DTS-HD.MA", "DDP", "EAC3", …
    audio_channels: str | None = None  # "5.1", "7.1", "2.0", …
    bit_depth: str | None = None  # "10bit", "8bit", …
    hdr_format: str | None = None  # "DV", "HDR10", "DV.HDR10", …
    edition: str | None = None  # "UNRATED", "EXTENDED", "DIRECTORS.CUT", …
+    distributor: str | None = None  # "NF", "AMZN", "DSNP", … (streaming origin)

    def __post_init__(self) -> None:
        if not self.raw:
@@ -158,36 +179,41 @@ class ParsedRelease:
                    f"ParsedRelease.episode_end ({self.episode_end}) < "
                    f"episode ({self.episode})"
                )
-        # Coerce raw strings into their enum form (tolerant constructor).
        if not isinstance(self.media_type, MediaTypeToken):
-            try:
-                self.media_type = MediaTypeToken(self.media_type)
-            except ValueError:
            raise ValidationError(
-                    f"ParsedRelease.media_type invalid: {self.media_type!r} "
-                    f"(expected one of {sorted(_VALID_MEDIA_TYPES)})"
-                ) from None
-        if not isinstance(self.parse_path, ParsePath):
-            try:
-                self.parse_path = ParsePath(self.parse_path)
-            except ValueError:
+                f"ParsedRelease.media_type must be a MediaTypeToken, "
+                f"got {type(self.media_type).__name__}: {self.media_type!r}"
+            )
+        if not isinstance(self.parse_path, TokenizationRoute):
            raise ValidationError(
-                    f"ParsedRelease.parse_path invalid: {self.parse_path!r} "
-                    f"(expected one of {sorted(_VALID_PARSE_PATHS)})"
-                ) from None
+                f"ParsedRelease.parse_path must be a TokenizationRoute, "
+                f"got {type(self.parse_path).__name__}: {self.parse_path!r}"
+            )

    @property
    def is_season_pack(self) -> bool:
        return self.season is not None and self.episode is None

-    def show_folder_name(self, tmdb_title: str, tmdb_year: int) -> str:
+    @property
+    def tech_string(self) -> str:
+        """``quality.source.codec`` joined by dots, skipping ``None`` parts.
+
+        Derived on every access so it stays in sync with the underlying
+        fields — no manual refresh needed after enrichment.
+        """
+        return ".".join(p for p in (self.quality, self.source, self.codec) if p)
+
+    def show_folder_name(self, tmdb_title_safe: str, tmdb_year: int) -> str:
        """
        Build the series root folder name.

        Format: {Title}.{Year}.{Tech}-{Group}
        Example: Oz.1997.1080p.WEBRip.x265-KONTRAST
+
+        ``tmdb_title_safe`` must already be filesystem-safe (the caller is
+        expected to have run it through ``kb.sanitize_for_fs``).
        """
-        title_part = _sanitize_for_fs(tmdb_title).replace(" ", ".")
+        title_part = tmdb_title_safe.replace(" ", ".")
        tech = self.tech_string or "Unknown"
        return f"{title_part}.{tmdb_year}.{tech}-{self.group}"

@@ -199,44 +225,47 @@ class ParsedRelease:
        For a single-episode release we still strip the episode token so the
        folder can hold the whole season.
        """
-        return _strip_episode_from_normalized(self.normalised)
+        return _strip_episode_from_normalized(self.clean)

-    def episode_filename(self, tmdb_episode_title: str | None, ext: str) -> str:
+    def episode_filename(self, tmdb_episode_title_safe: str | None, ext: str) -> str:
        """
        Build the episode filename.

        Format: {Title}.{SxxExx}.{EpisodeTitle}.{Tech}-{Group}.{ext}
        Example: Oz.S01E01.The.Routine.1080p.WEBRip.x265-KONTRAST.mkv

-        If tmdb_episode_title is None, omits the episode title segment.
+        ``tmdb_episode_title_safe`` must already be filesystem-safe; pass
+        ``None`` to omit the episode title segment.
        """
-        title_part = _sanitize_for_fs(self.title)
+        title_part = self.title_sanitized
        s = f"S{self.season:02d}" if self.season is not None else ""
        e = f"E{self.episode:02d}" if self.episode is not None else ""
        se = s + e

        ep_title = ""
-        if tmdb_episode_title:
-            ep_title = "." + _sanitize_for_fs(tmdb_episode_title).replace(" ", ".")
+        if tmdb_episode_title_safe:
+            ep_title = "." + tmdb_episode_title_safe.replace(" ", ".")

        tech = self.tech_string or "Unknown"
        ext_clean = ext.lstrip(".")
        return f"{title_part}.{se}{ep_title}.{tech}-{self.group}.{ext_clean}"

-    def movie_folder_name(self, tmdb_title: str, tmdb_year: int) -> str:
+    def movie_folder_name(self, tmdb_title_safe: str, tmdb_year: int) -> str:
        """
        Build the movie folder name.

        Format: {Title}.{Year}.{Tech}-{Group}
        Example: Inception.2010.1080p.BluRay.x265-GROUP
        """
-        return self.show_folder_name(tmdb_title, tmdb_year)
+        return self.show_folder_name(tmdb_title_safe, tmdb_year)

-    def movie_filename(self, tmdb_title: str, tmdb_year: int, ext: str) -> str:
+    def movie_filename(
+        self, tmdb_title_safe: str, tmdb_year: int, ext: str
+    ) -> str:
        """
        Build the movie filename (same as folder name + extension).

        Example: Inception.2010.1080p.BluRay.x265-GROUP.mkv
        """
        ext_clean = ext.lstrip(".")
-        return f"{self.movie_folder_name(tmdb_title, tmdb_year)}.{ext_clean}"
+        return f"{self.movie_folder_name(tmdb_title_safe, tmdb_year)}.{ext_clean}"
@@ -0,0 +1,267 @@
+"""Media — file-level track types (video/audio/subtitle) and MediaInfo container.
+
+These are the **container-view** dataclasses, populated from ffprobe output and
+used across the project to describe the content of a media file.
+
+Not to be confused with ``alfred.domain.subtitles.entities.SubtitleScanResult``
+which models a subtitle being **scanned/matched** (with confidence, raw tokens,
+file path, etc.). The two coexist by design — they describe the same real-world
+concept seen from two different bounded contexts.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+
+from .value_objects import Language
+
+__all__ = [
+    "AudioTrack",
+    "MediaInfo",
+    "MediaWithTracks",
+    "SubtitleTrack",
+    "VideoTrack",
+    "track_lang_matches",
+]
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Track types — one frozen dataclass per stream kind
+# ─────────────────────────────────────────────────────────────────────────────
+
+
+@dataclass(frozen=True)
+class AudioTrack:
+    """A single audio track as reported by ffprobe."""
+
+    index: int
+    codec: str | None  # aac, ac3, eac3, dts, truehd, flac, …
+    channels: int | None  # 2, 6 (5.1), 8 (7.1), …
+    channel_layout: str | None  # stereo, 5.1, 7.1, …
+    language: str | None  # ISO 639-2: fre, eng, und, …
+    is_default: bool = False
+
+
+@dataclass(frozen=True)
+class SubtitleTrack:
+    """A single embedded subtitle track as reported by ffprobe."""
+
+    index: int
+    codec: str | None  # subrip, ass, hdmv_pgs_subtitle, …
+    language: str | None  # ISO 639-2: fre, eng, und, …
+    is_default: bool = False
+    is_forced: bool = False
+
+
+@dataclass(frozen=True)
+class VideoTrack:
+    """A single video track as reported by ffprobe.
+
+    A media file typically has one video track but can have several (alt
+    camera angles, attached thumbnail images reported as still-image streams,
+    etc.), hence the list[VideoTrack] on MediaInfo.
+    """
+
+    index: int
+    codec: str | None  # h264, hevc, av1, …
+    width: int | None
+    height: int | None
+    is_default: bool = False
+
+    @property
+    def resolution(self) -> str | None:
+        """
+        Best-effort resolution string: 2160p, 1080p, 720p, …
+
+        Width takes priority over height to handle widescreen/cinema crops
+        (e.g. 1920×960 scope → 1080p, not 720p). Falls back to height when
+        width is unavailable.
+        """
+        match (self.width, self.height):
+            case (None, None):
+                return None
+            case (w, h) if w is not None:
+                match True:
+                    case _ if w >= 3840:
+                        return "2160p"
+                    case _ if w >= 1920:
+                        return "1080p"
+                    case _ if w >= 1280:
+                        return "720p"
+                    case _ if w >= 720:
+                        return "576p"
+                    case _ if w >= 640:
+                        return "480p"
+                    case _:
+                        return f"{h}p" if h else f"{w}w"
+            case (None, h):
+                match True:
+                    case _ if h >= 2160:
+                        return "2160p"
+                    case _ if h >= 1080:
+                        return "1080p"
+                    case _ if h >= 720:
+                        return "720p"
+                    case _ if h >= 576:
+                        return "576p"
+                    case _ if h >= 480:
+                        return "480p"
+                    case _:
+                        return f"{h}p"
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# MediaInfo — assembles video/audio/subtitle tracks for a media file
+# ─────────────────────────────────────────────────────────────────────────────
+
+
+@dataclass(frozen=True)
+class MediaInfo:
+    """
+    File-level media metadata extracted by ffprobe — immutable snapshot.
+
+    Symmetric design: every stream type is a tuple of typed track objects
+    (immutable on purpose — a MediaInfo is a frozen view of one ffprobe run,
+    not a mutable collection to append to).
+    Backwards-compatible flat accessors (``resolution``, ``width``, …) read
+    from the first video track when present.
+    """
+
+    video_tracks: tuple[VideoTrack, ...] = field(default_factory=tuple)
+    audio_tracks: tuple[AudioTrack, ...] = field(default_factory=tuple)
+    subtitle_tracks: tuple[SubtitleTrack, ...] = field(default_factory=tuple)
+
+    # File-level (from ffprobe ``format`` block, not from any single stream)
+    duration_seconds: float | None = None
+    bitrate_kbps: int | None = None
+
+    # ──────────────────────────────────────────────────────────────────────
+    # Video conveniences — read the first video track
+    # ──────────────────────────────────────────────────────────────────────
+
+    @property
+    def primary_video(self) -> VideoTrack | None:
+        return self.video_tracks[0] if self.video_tracks else None
+
+    @property
+    def width(self) -> int | None:
+        v = self.primary_video
+        return v.width if v else None
+
+    @property
+    def height(self) -> int | None:
+        v = self.primary_video
+        return v.height if v else None
+
+    @property
+    def video_codec(self) -> str | None:
+        v = self.primary_video
+        return v.codec if v else None
+
+    @property
+    def resolution(self) -> str | None:
+        v = self.primary_video
+        return v.resolution if v else None
+
+    # ──────────────────────────────────────────────────────────────────────
+    # Audio conveniences
+    # ──────────────────────────────────────────────────────────────────────
+
+    @property
+    def audio_languages(self) -> list[str]:
+        """Unique audio languages across all tracks (ISO 639-2)."""
+        seen: set[str] = set()
+        result: list[str] = []
+        for track in self.audio_tracks:
+            if track.language and track.language not in seen:
+                seen.add(track.language)
+                result.append(track.language)
+        return result
+
+    @property
+    def is_multi_audio(self) -> bool:
+        """True if more than one audio language is present."""
+        return len(self.audio_languages) > 1
+
+
+# ─────────────────────────────────────────────────────────────────────────────
+# Language matching — shared helper + mixin
+# ─────────────────────────────────────────────────────────────────────────────
+
+
+def track_lang_matches(track_lang: str | None, query: str | Language) -> bool:
+    """
+    Match a track's language string against a query (contract "C+").
+
+      * ``Language`` query → matches if the track string is any known
+        representation of that Language (delegates to ``Language.matches``).
+        Powerful, cross-format mode.
+      * ``str`` query → case-insensitive direct comparison against
+        ``track_lang``. Simple, no normalization, no registry lookup.
+
+    Callers needing cross-format resolution (``"fr"`` ↔ ``"fre"`` ↔
+    ``"french"``) should resolve their string through a ``LanguageRegistry``
+    once and pass the resulting ``Language``.
+    """
+    if track_lang is None:
+        return False
+    if isinstance(query, Language):
+        return query.matches(track_lang)
+    if isinstance(query, str):
+        return track_lang.lower().strip() == query.lower().strip()
+    return False
+
+
+class MediaWithTracks:
+    """
+    Mixin providing audio/subtitle helpers for entities with track collections.
+
+    Hosts must expose two attributes:
+
+    * ``audio_tracks: tuple[AudioTrack, ...]``
+    * ``subtitle_tracks: tuple[SubtitleTrack, ...]``
+
+    The helpers follow the "C+" matching contract: pass a :class:`Language`
+    for cross-format matching, or a ``str`` for case-insensitive comparison.
+    """
+
+    # These attributes are provided by the host entity (Movie, Episode, …).
+    # Declared here only for type-checkers and to make the contract explicit.
+    audio_tracks: tuple[AudioTrack, ...]
+    subtitle_tracks: tuple[SubtitleTrack, ...]
+
+    # ── Audio helpers ──────────────────────────────────────────────────────
+
+    def has_audio_in(self, lang: str | Language) -> bool:
+        """True if at least one audio track is in the given language."""
+        return any(track_lang_matches(t.language, lang) for t in self.audio_tracks)
+
+    def audio_languages(self) -> list[str]:
+        """Unique audio languages across all tracks, in track order."""
+        seen: set[str] = set()
+        result: list[str] = []
+        for t in self.audio_tracks:
+            if t.language and t.language not in seen:
+                seen.add(t.language)
+                result.append(t.language)
+        return result
+
+    # ── Subtitle helpers ───────────────────────────────────────────────────
+
+    def has_subtitles_in(self, lang: str | Language) -> bool:
+        """True if at least one subtitle track is in the given language."""
+        return any(track_lang_matches(t.language, lang) for t in self.subtitle_tracks)
+
+    def has_forced_subs(self) -> bool:
+        """True if at least one subtitle track is flagged as forced."""
+        return any(t.is_forced for t in self.subtitle_tracks)
+
+    def subtitle_languages(self) -> list[str]:
+        """Unique subtitle languages across all tracks, in track order."""
+        seen: set[str] = set()
+        result: list[str] = []
+        for t in self.subtitle_tracks:
+            if t.language and t.language not in seen:
+                seen.add(t.language)
+                result.append(t.language)
+        return result
@@ -1,21 +0,0 @@
-"""Media — file-level track types (video/audio/subtitle) and MediaInfo container.
-
-These are the **container-view** dataclasses, populated from ffprobe output and
-used across the project to describe the content of a media file.
-"""
-
-from .audio import AudioTrack
-from .info import MediaInfo
-from .matching import track_lang_matches
-from .subtitle import SubtitleTrack
-from .tracks_mixin import MediaWithTracks
-from .video import VideoTrack
-
-__all__ = [
-    "AudioTrack",
-    "MediaInfo",
-    "MediaWithTracks",
-    "SubtitleTrack",
-    "VideoTrack",
-    "track_lang_matches",
-]
@@ -1,17 +0,0 @@
-"""AudioTrack — a single audio stream as reported by ffprobe."""
-
-from __future__ import annotations
-
-from dataclasses import dataclass
-
-
-@dataclass(frozen=True)
-class AudioTrack:
-    """A single audio track as reported by ffprobe."""
-
-    index: int
-    codec: str | None  # aac, ac3, eac3, dts, truehd, flac, …
-    channels: int | None  # 2, 6 (5.1), 8 (7.1), …
-    channel_layout: str | None  # stereo, 5.1, 7.1, …
-    language: str | None  # ISO 639-2: fre, eng, und, …
-    is_default: bool = False
@@ -1,78 +0,0 @@
-"""MediaInfo — assembles video, audio and subtitle tracks for a media file."""
-
-from __future__ import annotations
-
-from dataclasses import dataclass, field
-
-from .audio import AudioTrack
-from .subtitle import SubtitleTrack
-from .video import VideoTrack
-
-
-@dataclass(frozen=True)
-class MediaInfo:
-    """
-    File-level media metadata extracted by ffprobe — immutable snapshot.
-
-    Symmetric design: every stream type is a tuple of typed track objects
-    (immutable on purpose — a MediaInfo is a frozen view of one ffprobe run,
-    not a mutable collection to append to).
-    Backwards-compatible flat accessors (``resolution``, ``width``, …) read
-    from the first video track when present.
-    """
-
-    video_tracks: tuple[VideoTrack, ...] = field(default_factory=tuple)
-    audio_tracks: tuple[AudioTrack, ...] = field(default_factory=tuple)
-    subtitle_tracks: tuple[SubtitleTrack, ...] = field(default_factory=tuple)
-
-    # File-level (from ffprobe ``format`` block, not from any single stream)
-    duration_seconds: float | None = None
-    bitrate_kbps: int | None = None
-
-    # ──────────────────────────────────────────────────────────────────────
-    # Video conveniences — read the first video track
-    # ──────────────────────────────────────────────────────────────────────
-
-    @property
-    def primary_video(self) -> VideoTrack | None:
-        return self.video_tracks[0] if self.video_tracks else None
-
-    @property
-    def width(self) -> int | None:
-        v = self.primary_video
-        return v.width if v else None
-
-    @property
-    def height(self) -> int | None:
-        v = self.primary_video
-        return v.height if v else None
-
-    @property
-    def video_codec(self) -> str | None:
-        v = self.primary_video
-        return v.codec if v else None
-
-    @property
-    def resolution(self) -> str | None:
-        v = self.primary_video
-        return v.resolution if v else None
-
-    # ──────────────────────────────────────────────────────────────────────
-    # Audio conveniences
-    # ──────────────────────────────────────────────────────────────────────
-
-    @property
-    def audio_languages(self) -> list[str]:
-        """Unique audio languages across all tracks (ISO 639-2)."""
-        seen: set[str] = set()
-        result: list[str] = []
-        for track in self.audio_tracks:
-            if track.language and track.language not in seen:
-                seen.add(track.language)
-                result.append(track.language)
-        return result
-
-    @property
-    def is_multi_audio(self) -> bool:
-        """True if more than one audio language is present."""
-        return len(self.audio_languages) > 1
@@ -1,33 +0,0 @@
-"""Language-matching helper shared by media-bearing entities.
-
-Both ``Episode`` and ``Movie`` carry ``audio_tracks`` / ``subtitle_tracks`` and
-need to answer "do I have audio in language X?". The matching contract is the
-same in both cases — keep it in one place.
-"""
-
-from __future__ import annotations
-
-from ..value_objects import Language
-
-
-def track_lang_matches(track_lang: str | None, query: str | Language) -> bool:
-    """
-    Match a track's language string against a query (contract "C+").
-
-      * ``Language`` query → matches if the track string is any known
-        representation of that Language (delegates to ``Language.matches``).
-        Powerful, cross-format mode.
-      * ``str`` query → case-insensitive direct comparison against
-        ``track_lang``. Simple, no normalization, no registry lookup.
-
-    Callers needing cross-format resolution (``"fr"`` ↔ ``"fre"`` ↔
-    ``"french"``) should resolve their string through a ``LanguageRegistry``
-    once and pass the resulting ``Language``.
-    """
-    if track_lang is None:
-        return False
-    if isinstance(query, Language):
-        return query.matches(track_lang)
-    if isinstance(query, str):
-        return track_lang.lower().strip() == query.lower().strip()
-    return False
@@ -1,25 +0,0 @@
-"""SubtitleTrack — a single embedded subtitle stream as reported by ffprobe.
-
-This is the **container-view** representation (ffprobe output) used uniformly
-across the project to describe a subtitle stream embedded in a media file.
-
-Not to be confused with ``alfred.domain.subtitles.entities.SubtitleCandidate``
-which models a subtitle being **scanned/matched** (with confidence, raw tokens,
-file path, etc.). The two coexist by design — they describe the same real-world
-concept seen from two different bounded contexts.
-"""
-
-from __future__ import annotations
-
-from dataclasses import dataclass
-
-
-@dataclass(frozen=True)
-class SubtitleTrack:
-    """A single embedded subtitle track as reported by ffprobe."""
-
-    index: int
-    codec: str | None  # subrip, ass, hdmv_pgs_subtitle, …
-    language: str | None  # ISO 639-2: fre, eng, und, …
-    is_default: bool = False
-    is_forced: bool = False
@@ -1,77 +0,0 @@
-"""Mixin shared by entities that carry audio + subtitle tracks.
-
-Both ``Movie`` and ``Episode`` carry a ``list[AudioTrack]`` plus a
-``list[SubtitleTrack]`` and answer the same 5 queries about them (language
-presence, unique languages, forced flag). Keep that behavior in one place so a
-fix in one is a fix in both.
-
-The mixin is plain Python (no dataclass machinery) so it composes cleanly with
-``@dataclass`` entities — it only reads ``self.audio_tracks`` and
-``self.subtitle_tracks`` which the host class provides as fields.
-"""
-
-from __future__ import annotations
-
-from typing import TYPE_CHECKING
-
-from ..value_objects import Language
-from .matching import track_lang_matches
-
-if TYPE_CHECKING:
-    from .audio import AudioTrack
-    from .subtitle import SubtitleTrack
-
-
-class MediaWithTracks:
-    """
-    Mixin providing audio/subtitle helpers for entities with track collections.
-
-    Hosts must expose two attributes:
-
-    * ``audio_tracks: list[AudioTrack]``
-    * ``subtitle_tracks: list[SubtitleTrack]``
-
-    The helpers follow the "C+" matching contract: pass a :class:`Language`
-    for cross-format matching, or a ``str`` for case-insensitive comparison.
-    """
-
-    # These attributes are provided by the host entity (Movie, Episode, …).
-    # Declared here only for type-checkers and to make the contract explicit.
-    audio_tracks: list["AudioTrack"]
-    subtitle_tracks: list["SubtitleTrack"]
-
-    # ── Audio helpers ──────────────────────────────────────────────────────
-
-    def has_audio_in(self, lang: str | Language) -> bool:
-        """True if at least one audio track is in the given language."""
-        return any(track_lang_matches(t.language, lang) for t in self.audio_tracks)
-
-    def audio_languages(self) -> list[str]:
-        """Unique audio languages across all tracks, in track order."""
-        seen: set[str] = set()
-        result: list[str] = []
-        for t in self.audio_tracks:
-            if t.language and t.language not in seen:
-                seen.add(t.language)
-                result.append(t.language)
-        return result
-
-    # ── Subtitle helpers ───────────────────────────────────────────────────
-
-    def has_subtitles_in(self, lang: str | Language) -> bool:
-        """True if at least one subtitle track is in the given language."""
-        return any(track_lang_matches(t.language, lang) for t in self.subtitle_tracks)
-
-    def has_forced_subs(self) -> bool:
-        """True if at least one subtitle track is flagged as forced."""
-        return any(t.is_forced for t in self.subtitle_tracks)
-
-    def subtitle_languages(self) -> list[str]:
-        """Unique subtitle languages across all tracks, in track order."""
-        seen: set[str] = set()
-        result: list[str] = []
-        for t in self.subtitle_tracks:
-            if t.language and t.language not in seen:
-                seen.add(t.language)
-                result.append(t.language)
-        return result
@@ -1,62 +0,0 @@
-"""VideoTrack — a single video stream as reported by ffprobe."""
-
-from __future__ import annotations
-
-from dataclasses import dataclass
-
-
-@dataclass(frozen=True)
-class VideoTrack:
-    """A single video track as reported by ffprobe.
-
-    A media file typically has one video track but can have several (alt
-    camera angles, attached thumbnail images reported as still-image streams,
-    etc.), hence the list[VideoTrack] on MediaInfo.
-    """
-
-    index: int
-    codec: str | None  # h264, hevc, av1, …
-    width: int | None
-    height: int | None
-    is_default: bool = False
-
-    @property
-    def resolution(self) -> str | None:
-        """
-        Best-effort resolution string: 2160p, 1080p, 720p, …
-
-        Width takes priority over height to handle widescreen/cinema crops
-        (e.g. 1920×960 scope → 1080p, not 720p). Falls back to height when
-        width is unavailable.
-        """
-        match (self.width, self.height):
-            case (None, None):
-                return None
-            case (w, h) if w is not None:
-                match True:
-                    case _ if w >= 3840:
-                        return "2160p"
-                    case _ if w >= 1920:
-                        return "1080p"
-                    case _ if w >= 1280:
-                        return "720p"
-                    case _ if w >= 720:
-                        return "576p"
-                    case _ if w >= 640:
-                        return "480p"
-                    case _:
-                        return f"{h}p" if h else f"{w}w"
-            case (None, h):
-                match True:
-                    case _ if h >= 2160:
-                        return "2160p"
-                    case _ if h >= 1080:
-                        return "1080p"
-                    case _ if h >= 720:
-                        return "720p"
-                    case _ if h >= 576:
-                        return "576p"
-                    case _ if h >= 480:
-                        return "480p"
-                    case _:
-                        return f"{h}p"
@@ -7,11 +7,13 @@ Protocol without going through real I/O.
 """

 from .filesystem_scanner import FileEntry, FilesystemScanner
+from .language_repository import LanguageRepository
 from .media_prober import MediaProber, SubtitleStreamInfo

 __all__ = [
    "FileEntry",
    "FilesystemScanner",
+    "LanguageRepository",
    "MediaProber",
    "SubtitleStreamInfo",
 ]
@@ -0,0 +1,36 @@
+"""LanguageRepository port — abstracts canonical language lookup.
+
+The adapter (typically loading from ISO 639 YAML knowledge) maps a wide
+range of raw forms (codes, English/native names, aliases) onto the
+canonical :class:`Language` value object. Domain code accepts the port
+via constructor injection; tests can pass a small in-memory fake.
+"""
+
+from __future__ import annotations
+
+from typing import Protocol
+
+from alfred.domain.shared.value_objects import Language
+
+
+class LanguageRepository(Protocol):
+    """Canonical language lookup."""
+
+    def from_iso(self, code: str) -> Language | None:
+        """Look up by canonical ISO 639-2/B code (case-insensitive)."""
+        ...
+
+    def from_any(self, raw: str) -> Language | None:
+        """Look up by any known representation: ISO code, name, alias.
+
+        Case-insensitive. Returns ``None`` when the raw form is unknown.
+        """
+        ...
+
+    def all(self) -> list[Language]:
+        """Return all known languages, in a stable order."""
+        ...
+
+    def __contains__(self, raw: str) -> bool: ...
+
+    def __len__(self) -> int: ...
@@ -9,7 +9,10 @@ from __future__ import annotations

 from dataclasses import dataclass
 from pathlib import Path
-from typing import Protocol
+from typing import TYPE_CHECKING, Protocol
+
+if TYPE_CHECKING:
+    from alfred.domain.shared.media import MediaInfo


@dataclass(frozen=True)
@@ -37,3 +40,13 @@ class MediaProber(Protocol):
        no subtitle streams. Adapters must not raise.
        """
        ...
+
+    def probe(self, video: Path) -> MediaInfo | None:
+        """Return the full :class:`MediaInfo` for ``video``, or ``None``.
+
+        Covers all stream families (video, audio, subtitle) plus
+        file-level duration / bitrate. ``None`` signals that ffprobe is
+        unavailable or the file can't be read — adapters must not
+        raise.
+        """
+        ...
@@ -1,5 +1,7 @@
 """Shared value objects used across multiple domains."""

+from __future__ import annotations
+
 import re
 from dataclasses import dataclass
 from pathlib import Path
@@ -43,29 +45,21 @@ class ImdbId:
@dataclass(frozen=True)
 class FilePath:
    """
-    Value object representing a file path with validation.
+    Value object representing a file path.

-    Ensures the path is valid and optionally checks existence.
+    Accepts either ``str`` or :class:`pathlib.Path` at construction;
+    the value is normalized to ``Path`` in ``__post_init__``.
    """

    value: Path

-    def __init__(self, path: str | Path):
-        """
-        Initialize FilePath.
-
-        Args:
-            path: String or Path object representing the file path
-        """
-        if isinstance(path, str):
-            path_obj = Path(path)
-        elif isinstance(path, Path):
-            path_obj = path
-        else:
-            raise ValidationError(f"Path must be str or Path, got {type(path)}")
-
-        # Use object.__setattr__ because dataclass is frozen
-        object.__setattr__(self, "value", path_obj)
+    def __post_init__(self) -> None:
+        if isinstance(self.value, Path):
+            return
+        if isinstance(self.value, str):
+            object.__setattr__(self, "value", Path(self.value))
+            return
+        raise ValidationError(f"Path must be str or Path, got {type(self.value)}")

    def __str__(self) -> str:
        return str(self.value)
@@ -150,19 +144,49 @@ class Language:
            raise ValidationError(
                f"Language.iso must be a 3-letter ISO 639-2/B code, got {self.iso!r}"
            )
-        # Normalize iso to lowercase
-        object.__setattr__(self, "iso", self.iso.lower())
-        # Normalize aliases to a tuple of lowercase strings (dedup, preserve order)
+        if self.iso != self.iso.lower():
+            raise ValidationError(
+                f"Language.iso must be lowercase, got {self.iso!r} — "
+                f"use Language.from_raw() to construct from arbitrary input"
+            )
+        for alias in self.aliases:
+            if not isinstance(alias, str) or alias != alias.lower().strip() or not alias:
+                raise ValidationError(
+                    f"Language.aliases must be lowercase non-empty strings, "
+                    f"got {alias!r} — use Language.from_raw() to normalize"
+                )
+
+    @classmethod
+    def from_raw(
+        cls,
+        iso: str,
+        english_name: str,
+        native_name: str,
+        aliases: tuple[str, ...] | list[str] = (),
+    ) -> Language:
+        """
+        Construct a Language from arbitrary (possibly un-normalized) input.
+
+        Use this factory when loading from external sources (YAML, user input,
+        third-party APIs) — it lowercases the iso code and normalizes/dedups
+        the alias tuple. The direct constructor is strict and rejects
+        un-normalized input.
+        """
        seen: set[str] = set()
        normalized: list[str] = []
-        for alias in self.aliases:
+        for alias in aliases:
            if not isinstance(alias, str):
                continue
            a = alias.lower().strip()
            if a and a not in seen:
                seen.add(a)
                normalized.append(a)
-        object.__setattr__(self, "aliases", tuple(normalized))
+        return cls(
+            iso=iso.lower(),
+            english_name=english_name,
+            native_name=native_name,
+            aliases=tuple(normalized),
+        )

    def matches(self, raw: str) -> bool:
        """
@@ -1,11 +1,12 @@
 """Subtitles domain — subtitle identification, classification and placement."""

 from .aggregates import SubtitleRuleSet
-from .entities import MediaSubtitleMetadata, SubtitleCandidate
+from .entities import MediaSubtitleMetadata, SubtitleScanResult
 from .exceptions import SubtitleNotFound
 from .services import PatternDetector, SubtitleIdentifier, SubtitleMatcher
 from .value_objects import (
    RuleScope,
+    RuleScopeLevel,
    ScanStrategy,
    SubtitleFormat,
    SubtitleLanguage,
@@ -16,7 +17,7 @@ from .value_objects import (
 )

 __all__ = [
-    "SubtitleCandidate",
+    "SubtitleScanResult",
    "MediaSubtitleMetadata",
    "SubtitleRuleSet",
    "SubtitleIdentifier",
@@ -30,5 +31,6 @@ __all__ = [
    "TypeDetectionMethod",
    "SubtitleMatchingRules",
    "RuleScope",
+    "RuleScopeLevel",
    "SubtitleNotFound",
 ]
@@ -4,7 +4,7 @@ from dataclasses import dataclass, field
 from typing import Any

 from ..shared.value_objects import ImdbId
-from .value_objects import RuleScope, SubtitleMatchingRules
+from .value_objects import RuleScope, RuleScopeLevel, SubtitleMatchingRules


@dataclass
@@ -86,10 +86,13 @@ class SubtitleRuleSet:
        if self._min_confidence is not None:
            delta["min_confidence"] = self._min_confidence
        return {
-            "scope": {"level": self.scope.level, "identifier": self.scope.identifier},
+            "scope": {
+                "level": self.scope.level.value,
+                "identifier": self.scope.identifier,
+            },
            "override": delta,
        }

    @classmethod
    def global_default(cls) -> SubtitleRuleSet:
-        return cls(scope=RuleScope(level="global"))
+        return cls(scope=RuleScope(level=RuleScopeLevel.GLOBAL))
@@ -12,16 +12,18 @@ from .value_objects import (


@dataclass
-class SubtitleCandidate:
+class SubtitleScanResult:
    """
-    A subtitle being scanned and matched — either an external file or an embedded stream.
+    A subtitle observed during a scan — either an external file or an embedded stream.

    Unlike ``alfred.domain.shared.media.SubtitleTrack`` (the pure container-view
-    populated from ffprobe), a SubtitleCandidate carries the **flow state** of the
-    subtitle matching pipeline: language/format are typed value objects that may
-    be ``None`` while classification is in progress, ``confidence`` reflects how
-    certain we are, and ``raw_tokens`` holds the filename fragments still under
-    analysis. State evolves: unknown → resolved after user clarification.
+    populated from ffprobe), a ``SubtitleScanResult`` carries the **flow state**
+    of the subtitle matching pipeline: language/format are typed value objects
+    that may be ``None`` while classification is in progress, ``confidence``
+    reflects how certain we are, and ``raw_tokens`` holds the filename fragments
+    still under analysis. State evolves: unknown → resolved after user
+    clarification. The name reflects this — it's the **output of a scan pass**,
+    not a value object.
    """

    # Classification (may be None if not yet resolved)
@@ -72,7 +74,7 @@ class SubtitleCandidate:
            if self.is_embedded
            else str(self.file_path.name if self.file_path else "?")
        )
-        return f"SubtitleCandidate({lang}, {self.subtitle_type.value}, {fmt}, src={src}, conf={self.confidence:.2f})"
+        return f"SubtitleScanResult({lang}, {self.subtitle_type.value}, {fmt}, src={src}, conf={self.confidence:.2f})"


@dataclass
@@ -84,14 +86,14 @@ class MediaSubtitleMetadata:

    media_id: ImdbId | None
    media_type: str  # "movie" | "tv_show"
-    embedded_tracks: list[SubtitleCandidate] = field(default_factory=list)
-    external_tracks: list[SubtitleCandidate] = field(default_factory=list)
+    embedded_tracks: list[SubtitleScanResult] = field(default_factory=list)
+    external_tracks: list[SubtitleScanResult] = field(default_factory=list)
    release_group: str | None = None
    detected_pattern_id: str | None = None  # pattern id from knowledge base
    pattern_confirmed: bool = False

    @property
-    def all_tracks(self) -> list[SubtitleCandidate]:
+    def all_tracks(self) -> list[SubtitleScanResult]:
        return self.embedded_tracks + self.external_tracks

    @property
@@ -99,5 +101,5 @@ class MediaSubtitleMetadata:
        return len(self.embedded_tracks) + len(self.external_tracks)

    @property
-    def unresolved_tracks(self) -> list[SubtitleCandidate]:
+    def unresolved_tracks(self) -> list[SubtitleScanResult]:
        return [t for t in self.external_tracks if t.language is None]
@@ -7,7 +7,7 @@ from pathlib import Path
 from ...shared.ports import FilesystemScanner, MediaProber
 from ..ports import SubtitleKnowledge
 from ...shared.value_objects import ImdbId
-from ..entities import MediaSubtitleMetadata, SubtitleCandidate
+from ..entities import MediaSubtitleMetadata, SubtitleScanResult
 from ..value_objects import ScanStrategy, SubtitlePattern, SubtitleType

 logger = logging.getLogger(__name__)
@@ -94,7 +94,7 @@ class SubtitleIdentifier:
    # Embedded tracks — via MediaProber
    # ------------------------------------------------------------------

-    def _scan_embedded(self, video_path: Path) -> list[SubtitleCandidate]:
+    def _scan_embedded(self, video_path: Path) -> list[SubtitleScanResult]:
        streams = self.prober.list_subtitle_streams(video_path)

        tracks = []
@@ -111,7 +111,7 @@ class SubtitleIdentifier:
                stype = SubtitleType.STANDARD

            tracks.append(
-                SubtitleCandidate(
+                SubtitleScanResult(
                    language=lang,
                    format=None,
                    subtitle_type=stype,
@@ -131,7 +131,7 @@ class SubtitleIdentifier:

    def _scan_external(
        self, video_path: Path, pattern: SubtitlePattern
-    ) -> list[SubtitleCandidate]:
+    ) -> list[SubtitleScanResult]:
        strategy = pattern.scan_strategy
        episode_stem: str | None = None

@@ -200,7 +200,7 @@ class SubtitleIdentifier:
        entries: list,
        pattern: SubtitlePattern,
        episode_stem: str | None = None,
-    ) -> list[SubtitleCandidate]:
+    ) -> list[SubtitleScanResult]:
        tracks = [
            self._classify_single(entry, episode_stem=episode_stem) for entry in entries
        ]
@@ -214,7 +214,7 @@ class SubtitleIdentifier:

    def _classify_single(
        self, entry, episode_stem: str | None = None
-    ) -> SubtitleCandidate:
+    ) -> SubtitleScanResult:
        fmt = self.kb.format_for_extension(entry.suffix)
        tokens = (
            _tokenize_suffix(entry.stem, episode_stem)
@@ -253,7 +253,7 @@ class SubtitleIdentifier:
        if entry.suffix.lower() == ".srt":
            entry_count = _count_entries(self.scanner.read_text(entry.path))

-        return SubtitleCandidate(
+        return SubtitleScanResult(
            language=language,
            format=fmt,
            subtitle_type=subtitle_type,
@@ -266,8 +266,8 @@ class SubtitleIdentifier:
        )

    def _disambiguate_by_size(
-        self, tracks: list[SubtitleCandidate]
-    ) -> list[SubtitleCandidate]:
+        self, tracks: list[SubtitleScanResult]
+    ) -> list[SubtitleScanResult]:
        """
        When multiple tracks share the same language and type is UNKNOWN/STANDARD,
        the one with the most entries (lines) is SDH, the smallest is FORCED if
@@ -277,7 +277,7 @@ class SubtitleIdentifier:
        """

        # Group by language code
-        lang_groups: dict[str, list[SubtitleCandidate]] = {}
+        lang_groups: dict[str, list[SubtitleScanResult]] = {}
        for track in tracks:
            key = track.language.code if track.language else "__unknown__"
            lang_groups.setdefault(key, []).append(track)
@@ -306,6 +306,6 @@ class SubtitleIdentifier:

        return result

-    def _set_type(self, track: SubtitleCandidate, stype: SubtitleType) -> None:
+    def _set_type(self, track: SubtitleScanResult, stype: SubtitleType) -> None:
        """Mutate track type in-place."""
        track.subtitle_type = stype
@@ -2,7 +2,7 @@

 import logging

-from ..entities import SubtitleCandidate
+from ..entities import SubtitleScanResult
 from ..value_objects import SubtitleMatchingRules

 logger = logging.getLogger(__name__)
@@ -10,7 +10,7 @@ logger = logging.getLogger(__name__)

 class SubtitleMatcher:
    """
-    Filters a list of SubtitleCandidate against effective SubtitleMatchingRules.
+    Filters a list of SubtitleScanResult against effective SubtitleMatchingRules.

    Returns matched tracks (pass all filters, confidence >= min_confidence)
    and unresolved tracks (need user clarification).
@@ -21,14 +21,14 @@ class SubtitleMatcher:

    def match(
        self,
-        tracks: list[SubtitleCandidate],
+        tracks: list[SubtitleScanResult],
        rules: SubtitleMatchingRules,
-    ) -> tuple[list[SubtitleCandidate], list[SubtitleCandidate]]:
+    ) -> tuple[list[SubtitleScanResult], list[SubtitleScanResult]]:
        """
        Returns (matched, unresolved).
        """
-        matched: list[SubtitleCandidate] = []
-        unresolved: list[SubtitleCandidate] = []
+        matched: list[SubtitleScanResult] = []
+        unresolved: list[SubtitleScanResult] = []

        for track in tracks:
            if track.is_embedded:
@@ -51,7 +51,7 @@ class SubtitleMatcher:
        return matched, unresolved

    def _passes_filters(
-        self, track: SubtitleCandidate, rules: SubtitleMatchingRules
+        self, track: SubtitleScanResult, rules: SubtitleMatchingRules
    ) -> bool:
        # Language filter
        if rules.preferred_languages:
@@ -76,14 +76,14 @@ class SubtitleMatcher:

    def _resolve_conflicts(
        self,
-        tracks: list[SubtitleCandidate],
+        tracks: list[SubtitleScanResult],
        rules: SubtitleMatchingRules,
-    ) -> list[SubtitleCandidate]:
+    ) -> list[SubtitleScanResult]:
        """
        When multiple tracks have same language + type, keep only the best one
        according to format_priority. If no format_priority applies, keep the first.
        """
-        seen: dict[tuple, SubtitleCandidate] = {}
+        seen: dict[tuple, SubtitleScanResult] = {}

        for track in tracks:
            lang = track.language.code if track.language else None
@@ -106,8 +106,8 @@ class SubtitleMatcher:

    def _prefer(
        self,
-        candidate: SubtitleCandidate,
-        existing: SubtitleCandidate,
+        candidate: SubtitleScanResult,
+        existing: SubtitleScanResult,
        format_priority: list[str],
    ) -> bool:
        """Return True if candidate is preferable to existing."""
@@ -1,9 +1,9 @@
 """Subtitle service utilities."""

-from ..entities import SubtitleCandidate
+from ..entities import SubtitleScanResult


-def available_subtitles(tracks: list[SubtitleCandidate]) -> list[SubtitleCandidate]:
+def available_subtitles(tracks: list[SubtitleScanResult]) -> list[SubtitleScanResult]:
    """
    Return the distinct subtitle tracks available, deduped by (language, type).

@@ -11,7 +11,7 @@ def available_subtitles(tracks: list[SubtitleCandidate]) -> list[SubtitleCandida
    preferences — e.g. eng, eng.sdh, fra all show up as separate entries.
    """
    seen: set[tuple] = set()
-    result: list[SubtitleCandidate] = []
+    result: list[SubtitleScanResult] = []
    for track in tracks:
        lang = track.language.code if track.language else None
        key = (lang, track.subtitle_type)
@@ -83,9 +83,20 @@ class SubtitleMatchingRules:
    min_confidence: float = 0.7


+class RuleScopeLevel(str, Enum):
+    """At which level a subtitle rule set applies."""
+
+    GLOBAL = "global"
+    RELEASE_GROUP = "release_group"
+    MOVIE = "movie"
+    SHOW = "show"
+    SEASON = "season"
+    EPISODE = "episode"
+
+
@dataclass(frozen=True)
 class RuleScope:
    """At which level a rule set applies."""

-    level: str  # "global" | "release_group" | "movie" | "show" | "season" | "episode"
+    level: RuleScopeLevel
    identifier: str | None = None  # imdb_id, group name, "S01", "S01E03"…
@@ -47,16 +47,19 @@ from .value_objects import (
 # ════════════════════════════════════════════════════════════════════════════


-@dataclass(eq=False)
+@dataclass(frozen=True, eq=False)
 class Episode(MediaWithTracks):
    """
    A single episode of a TV show — leaf of the TVShow aggregate.

    Carries the file metadata (path, size) and the discovered tracks
-    (audio + subtitle). Track lists are populated by the ffprobe + subtitle
+    (audio + subtitle). Track tuples are populated by the ffprobe + subtitle
    scan pipeline; they may be empty when the episode is known but not yet
    scanned, or when no file is downloaded yet.

+    Frozen: rebuild via ``dataclasses.replace`` to project enrichment results
+    onto a new instance.
+
    Equality is identity-based within the aggregate: two ``Episode`` instances
    are equal iff they share the same ``(season_number, episode_number)``,
    regardless of title/file/track contents. The root TVShow guarantees
@@ -68,17 +71,21 @@ class Episode(MediaWithTracks):
    title: str
    file_path: FilePath | None = None
    file_size: FileSize | None = None
-    audio_tracks: list[AudioTrack] = field(default_factory=list)
-    subtitle_tracks: list[SubtitleTrack] = field(default_factory=list)
+    audio_tracks: tuple[AudioTrack, ...] = field(default_factory=tuple)
+    subtitle_tracks: tuple[SubtitleTrack, ...] = field(default_factory=tuple)

    def __post_init__(self) -> None:
        # Coerce numbers if raw ints were passed
        if not isinstance(self.season_number, SeasonNumber):
            if isinstance(self.season_number, int):
-                self.season_number = SeasonNumber(self.season_number)
+                object.__setattr__(
+                    self, "season_number", SeasonNumber(self.season_number)
+                )
        if not isinstance(self.episode_number, EpisodeNumber):
            if isinstance(self.episode_number, int):
-                self.episode_number = EpisodeNumber(self.episode_number)
+                object.__setattr__(
+                    self, "episode_number", EpisodeNumber(self.episode_number)
+                )

    def __eq__(self, other: object) -> bool:
        if not isinstance(other, Episode):
@@ -1,121 +0,0 @@
-"""ffprobe — infrastructure adapter for extracting MediaInfo from a video file."""
-
-from __future__ import annotations
-
-import json
-import logging
-import subprocess
-from pathlib import Path
-
-from alfred.domain.shared.media import AudioTrack, MediaInfo, SubtitleTrack, VideoTrack
-
-logger = logging.getLogger(__name__)
-
-_FFPROBE_CMD = [
-    "ffprobe",
-    "-v",
-    "quiet",
-    "-print_format",
-    "json",
-    "-show_streams",
-    "-show_format",
-]
-
-
-def probe(path: Path) -> MediaInfo | None:
-    """
-    Run ffprobe on path and return a MediaInfo.
-
-    Returns None if ffprobe is not available or the file cannot be probed.
-    """
-    try:
-        result = subprocess.run(
-            [*_FFPROBE_CMD, str(path)],
-            capture_output=True,
-            text=True,
-            timeout=30,
-            check=False,
-        )
-    except subprocess.TimeoutExpired:
-        logger.warning("ffprobe timed out on %s", path)
-        return None
-
-    if result.returncode != 0:
-        logger.warning("ffprobe failed on %s: %s", path, result.stderr.strip())
-        return None
-
-    try:
-        data = json.loads(result.stdout)
-    except json.JSONDecodeError:
-        logger.warning("ffprobe returned invalid JSON for %s", path)
-        return None
-
-    return _parse(data)
-
-
-def _parse(data: dict) -> MediaInfo:
-    streams = data.get("streams", [])
-    fmt = data.get("format", {})
-
-    # File-level duration/bitrate (ffprobe ``format`` block — independent of streams)
-    duration_seconds: float | None = None
-    bitrate_kbps: int | None = None
-    if "duration" in fmt:
-        try:
-            duration_seconds = float(fmt["duration"])
-        except ValueError:
-            pass
-    if "bit_rate" in fmt:
-        try:
-            bitrate_kbps = int(fmt["bit_rate"]) // 1000
-        except ValueError:
-            pass
-
-    video_tracks: list[VideoTrack] = []
-    audio_tracks: list[AudioTrack] = []
-    subtitle_tracks: list[SubtitleTrack] = []
-
-    for stream in streams:
-        codec_type = stream.get("codec_type")
-
-        if codec_type == "video":
-            video_tracks.append(
-                VideoTrack(
-                    index=stream.get("index", len(video_tracks)),
-                    codec=stream.get("codec_name"),
-                    width=stream.get("width"),
-                    height=stream.get("height"),
-                    is_default=stream.get("disposition", {}).get("default", 0) == 1,
-                )
-            )
-
-        elif codec_type == "audio":
-            audio_tracks.append(
-                AudioTrack(
-                    index=stream.get("index", len(audio_tracks)),
-                    codec=stream.get("codec_name"),
-                    channels=stream.get("channels"),
-                    channel_layout=stream.get("channel_layout"),
-                    language=stream.get("tags", {}).get("language"),
-                    is_default=stream.get("disposition", {}).get("default", 0) == 1,
-                )
-            )
-
-        elif codec_type == "subtitle":
-            subtitle_tracks.append(
-                SubtitleTrack(
-                    index=stream.get("index", len(subtitle_tracks)),
-                    codec=stream.get("codec_name"),
-                    language=stream.get("tags", {}).get("language"),
-                    is_default=stream.get("disposition", {}).get("default", 0) == 1,
-                    is_forced=stream.get("disposition", {}).get("forced", 0) == 1,
-                )
-            )
-
-    return MediaInfo(
-        video_tracks=tuple(video_tracks),
-        audio_tracks=tuple(audio_tracks),
-        subtitle_tracks=tuple(subtitle_tracks),
-        duration_seconds=duration_seconds,
-        bitrate_kbps=bitrate_kbps,
-    )
@@ -4,10 +4,10 @@ from __future__ import annotations

 from pathlib import Path

-from alfred.domain.release.value_objects import _VIDEO_EXTENSIONS
+from alfred.domain.release.ports import ReleaseKnowledge


-def find_video_file(path: Path) -> Path | None:
+def find_video_file(path: Path, kb: ReleaseKnowledge) -> Path | None:
    """
    Return the first video file found at path.

@@ -15,11 +15,12 @@ def find_video_file(path: Path) -> Path | None:
    - If path is a folder — scan recursively, return the first video found
      (sorted by name for determinism, picks S01E01 before S01E02 etc.).
    """
+    video_exts = kb.video_extensions
    if path.is_file():
-        return path if path.suffix.lower() in _VIDEO_EXTENSIONS else None
+        return path if path.suffix.lower() in video_exts else None

    for candidate in sorted(path.rglob("*")):
-        if candidate.is_file() and candidate.suffix.lower() in _VIDEO_EXTENSIONS:
+        if candidate.is_file() and candidate.suffix.lower() in video_exts:
            return candidate

    return None
@@ -87,7 +87,7 @@ class LanguageRegistry:
        merged = _merge_language_entries(builtin, learned)

        for iso, entry in merged.items():
-            language = Language(
+            language = Language.from_raw(
                iso=iso,
                english_name=entry.get("english_name", iso),
                native_name=entry.get("native_name", iso),
@@ -16,9 +16,11 @@ import alfred as _alfred_pkg

 _BUILTIN_ROOT = Path(_alfred_pkg.__file__).parent / "knowledge" / "release"
 _SITES_ROOT = _BUILTIN_ROOT / "sites"
+_GROUPS_ROOT = _BUILTIN_ROOT / "release_groups"
 _LEARNED_ROOT = (
    Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge" / "release"
 )
+_LEARNED_GROUPS_ROOT = _LEARNED_ROOT / "release_groups"


 def _merge(base: dict, overlay: dict) -> dict:
@@ -62,6 +64,15 @@ def load_sources() -> set[str]:
    return set(_load("sources.yaml").get("sources", []))


+def load_distributors() -> set[str]:
+    """Streaming distributor tokens (NF, AMZN, DSNP, …).
+
+    Distinct from ``load_sources()`` — distributors are uppercase scene
+    tags identifying the platform, not the capture origin.
+    """
+    return {t.upper() for t in _load("distributors.yaml").get("distributors", [])}
+
+
 def load_codecs() -> set[str]:
    return set(_load("codecs.yaml").get("codecs", []))

@@ -128,6 +139,88 @@ def load_media_type_tokens() -> dict:
    return _load_sites().get("media_type_tokens", {})


+def load_group_schemas() -> dict:
+    """Load every release-group schema YAML keyed by uppercase group name.
+
+    Builtin schemas in ``alfred/knowledge/release/release_groups/`` are
+    merged with user-learned schemas in
+    ``data/knowledge/release/release_groups/`` (the learned ones win on
+    name collision).
+    """
+    result: dict = {}
+    for root in (_GROUPS_ROOT, _LEARNED_GROUPS_ROOT):
+        if not root.is_dir():
+            continue
+        for path in sorted(root.glob("*.yaml")):
+            data = _read(path)
+            name = data.get("name")
+            if not name:
+                continue
+            result[name.upper()] = data
+    return result
+
+
+def load_scoring() -> dict:
+    """Load the parse-scoring config.
+
+    Returns a dict with three top-level keys: ``weights``, ``penalties``,
+    ``thresholds``. Defaults are baked in so a missing or partial YAML
+    never breaks the parser — only de-tunes it.
+    """
+    raw = _load("scoring.yaml")
+    weights = {
+        "title": 30,
+        "media_type": 20,
+        "year": 15,
+        "season": 10,
+        "episode": 5,
+        "resolution": 5,
+        "source": 5,
+        "codec": 5,
+        "group": 5,
+    }
+    weights.update(raw.get("weights", {}) or {})
+    penalties = {"unknown_token": 5, "max_unknown_penalty": 30}
+    penalties.update(raw.get("penalties", {}) or {})
+    thresholds = {"shitty_min": 60}
+    thresholds.update(raw.get("thresholds", {}) or {})
+    return {
+        "weights": weights,
+        "penalties": penalties,
+        "thresholds": thresholds,
+    }
+
+
+def load_probe_mappings() -> dict:
+    """Load ffprobe→scene-token translation tables.
+
+    Returns a dict with three keys:
+
+      - ``video_codec``: ``{ffprobe_codec_lower: scene_token}``
+      - ``audio_codec``: ``{ffprobe_codec_lower: scene_token}``
+      - ``audio_channels``: ``{channel_count_int: layout_str}``
+
+    Channel-count keys are normalized to ``int`` here so the consumer can
+    look up ``track.channels`` directly. Missing sections fall back to
+    empty dicts — the enrichment code degrades to its uppercase-fallback
+    path when a mapping is absent.
+    """
+    raw = _load("probe_mappings.yaml")
+    video_codec = {k.lower(): v for k, v in (raw.get("video_codec") or {}).items()}
+    audio_codec = {k.lower(): v for k, v in (raw.get("audio_codec") or {}).items()}
+    audio_channels: dict[int, str] = {}
+    for k, v in (raw.get("audio_channels") or {}).items():
+        try:
+            audio_channels[int(k)] = v
+        except (TypeError, ValueError):
+            continue
+    return {
+        "video_codec": video_codec,
+        "audio_codec": audio_codec,
+        "audio_channels": audio_channels,
+    }
+
+
 def load_separators() -> list[str]:
    """Single-char token separators used by the release name tokenizer.

@@ -0,0 +1,127 @@
+"""YamlReleaseKnowledge — concrete adapter for the ``ReleaseKnowledge``
+domain port.
+
+Loads every release-knowledge YAML once at construction time and exposes
+the parsed snapshots as plain attributes. The application layer builds a
+single instance at boot and passes it down to ``parse_release`` and to
+``ParsedRelease`` builder methods.
+
+A few extras (``video_extensions``, ``non_video_extensions``,
+``subtitle_extensions``, ``metadata_extensions``) are not part of the
+domain port — they are consumed by application/infra modules that handle
+filesystem-level concerns.
+"""
+
+from __future__ import annotations
+
+from alfred.domain.release.parser.schema import GroupSchema, SchemaChunk
+from alfred.domain.release.parser.tokens import TokenRole
+
+from .release import (
+    load_audio,
+    load_codecs,
+    load_distributors,
+    load_editions,
+    load_forbidden_chars,
+    load_group_schemas,
+    load_hdr_extra,
+    load_language_tokens,
+    load_media_type_tokens,
+    load_metadata_extensions,
+    load_non_video_extensions,
+    load_probe_mappings,
+    load_resolutions,
+    load_scoring,
+    load_separators,
+    load_sources,
+    load_sources_extra,
+    load_subtitle_extensions,
+    load_video,
+    load_video_extensions,
+    load_win_forbidden_chars,
+)
+
+
+def _build_group_schema(data: dict) -> GroupSchema:
+    """Translate a raw YAML schema dict into a frozen :class:`GroupSchema`.
+
+    Unknown roles raise ``ValueError`` early so a typo in a YAML file
+    surfaces at construction time, not on first parse.
+    """
+    chunks = tuple(
+        SchemaChunk(
+            role=TokenRole(entry["role"]),
+            optional=bool(entry.get("optional", False)),
+        )
+        for entry in data.get("chunk_order", [])
+    )
+    return GroupSchema(
+        name=data["name"],
+        separator=data.get("separator", "."),
+        chunks=chunks,
+    )
+
+
+class YamlReleaseKnowledge:
+    """Single object holding every parsed-release knowledge constant.
+
+    Built once at application boot. Read-only at runtime — call sites
+    treat it as a snapshot. To pick up newly learned tokens without a
+    restart, build a fresh instance and swap it in at the call sites.
+    """
+
+    def __init__(self) -> None:
+        # Domain-port surface
+        self.resolutions: set[str] = load_resolutions()
+        self.sources: set[str] = load_sources() | load_sources_extra()
+        self.codecs: set[str] = load_codecs()
+        self.distributors: set[str] = load_distributors()
+        self.language_tokens: set[str] = load_language_tokens()
+        self.forbidden_chars: set[str] = load_forbidden_chars()
+        self.hdr_extra: set[str] = load_hdr_extra()
+
+        self.audio: dict = load_audio()
+        self.video_meta: dict = load_video()
+        self.editions: dict = load_editions()
+        self.media_type_tokens: dict = load_media_type_tokens()
+
+        self.separators: list[str] = load_separators()
+
+        # Parse-scoring config (weights / penalties / thresholds).
+        self.scoring: dict = load_scoring()
+
+        # ffprobe → scene-token mapping tables (consumed by
+        # ``application.release.enrich_from_probe``).
+        self.probe_mappings: dict = load_probe_mappings()
+
+        # File-extension sets (used by application/infra modules, not by
+        # the parser itself — kept here so there is a single ownership
+        # point for release knowledge).
+        self.video_extensions: set[str] = load_video_extensions()
+        self.non_video_extensions: set[str] = load_non_video_extensions()
+        self.subtitle_extensions: set[str] = load_subtitle_extensions()
+        # Metadata + subtitle extensions are both ignored when deciding
+        # the media type of a folder (neither is a conclusive signal for
+        # movie/tv/other), so we expose the union under the historical
+        # name.
+        self.metadata_extensions: set[str] = (
+            load_metadata_extensions() | self.subtitle_extensions
+        )
+
+        # Translation table for stripping Windows-forbidden chars.
+        self._win_forbidden_table = str.maketrans(
+            "", "", "".join(load_win_forbidden_chars())
+        )
+
+        # Group schemas, keyed by uppercase group name for fast lookup.
+        self._group_schemas: dict[str, GroupSchema] = {
+            key: _build_group_schema(data)
+            for key, data in load_group_schemas().items()
+        }
+
+    def sanitize_for_fs(self, text: str) -> str:
+        """Strip Windows-forbidden characters from ``text``."""
+        return text.translate(self._win_forbidden_table)
+
+    def group_schema(self, name: str) -> GroupSchema | None:
+        return self._group_schemas.get(name.upper())
@@ -2,7 +2,7 @@

 import logging

-from alfred.infrastructure.knowledge.language_registry import LanguageRegistry
+from alfred.domain.shared.ports import LanguageRepository
 from alfred.domain.subtitles.value_objects import (
    ScanStrategy,
    SubtitleFormat,
@@ -12,6 +12,8 @@ from alfred.domain.subtitles.value_objects import (
    SubtitleType,
    TypeDetectionMethod,
 )
+from alfred.infrastructure.knowledge.language_registry import LanguageRegistry
+
 from .loader import KnowledgeLoader

 logger = logging.getLogger(__name__)
@@ -28,10 +30,12 @@ class SubtitleKnowledgeBase:
    def __init__(
        self,
        loader: KnowledgeLoader | None = None,
-        language_registry: LanguageRegistry | None = None,
+        language_registry: LanguageRepository | None = None,
    ):
        self._loader = loader or KnowledgeLoader()
-        self._language_registry = language_registry or LanguageRegistry()
+        self._language_registry: LanguageRepository = (
+            language_registry or LanguageRegistry()
+        )
        self._build()

    def _build(self) -> None:  # noqa: PLR0912 — straight-line YAML projection
@@ -7,12 +7,23 @@ import logging
 import subprocess
 from pathlib import Path

+from alfred.domain.shared.media import AudioTrack, MediaInfo, SubtitleTrack, VideoTrack
 from alfred.domain.shared.ports import SubtitleStreamInfo

 logger = logging.getLogger(__name__)

 _FFPROBE_TIMEOUT_SECONDS = 30

+_FFPROBE_FULL_CMD = [
+    "ffprobe",
+    "-v",
+    "quiet",
+    "-print_format",
+    "json",
+    "-show_streams",
+    "-show_format",
+]
+

 class FfprobeMediaProber:
    """Inspect media files by shelling out to ``ffprobe``.
@@ -63,3 +74,101 @@ class FfprobeMediaProber:
                )
            )
        return streams
+
+    def probe(self, video: Path) -> MediaInfo | None:
+        """Run ffprobe on ``video`` and return a :class:`MediaInfo`.
+
+        Returns ``None`` when ffprobe is not available, times out, or
+        the file cannot be parsed. Never raises.
+        """
+        try:
+            result = subprocess.run(
+                [*_FFPROBE_FULL_CMD, str(video)],
+                capture_output=True,
+                text=True,
+                timeout=_FFPROBE_TIMEOUT_SECONDS,
+                check=False,
+            )
+        except (subprocess.TimeoutExpired, FileNotFoundError) as e:
+            logger.warning("ffprobe failed on %s: %s", video, e)
+            return None
+
+        if result.returncode != 0:
+            logger.warning("ffprobe failed on %s: %s", video, result.stderr.strip())
+            return None
+
+        try:
+            data = json.loads(result.stdout)
+        except json.JSONDecodeError:
+            logger.warning("ffprobe returned invalid JSON for %s", video)
+            return None
+
+        return _parse_media_info(data)
+
+
+def _parse_media_info(data: dict) -> MediaInfo:
+    """Translate raw ffprobe JSON into a :class:`MediaInfo` snapshot."""
+    streams = data.get("streams", [])
+    fmt = data.get("format", {})
+
+    duration_seconds: float | None = None
+    bitrate_kbps: int | None = None
+    if "duration" in fmt:
+        try:
+            duration_seconds = float(fmt["duration"])
+        except ValueError:
+            pass
+    if "bit_rate" in fmt:
+        try:
+            bitrate_kbps = int(fmt["bit_rate"]) // 1000
+        except ValueError:
+            pass
+
+    video_tracks: list[VideoTrack] = []
+    audio_tracks: list[AudioTrack] = []
+    subtitle_tracks: list[SubtitleTrack] = []
+
+    for stream in streams:
+        codec_type = stream.get("codec_type")
+
+        if codec_type == "video":
+            video_tracks.append(
+                VideoTrack(
+                    index=stream.get("index", len(video_tracks)),
+                    codec=stream.get("codec_name"),
+                    width=stream.get("width"),
+                    height=stream.get("height"),
+                    is_default=stream.get("disposition", {}).get("default", 0) == 1,
+                )
+            )
+
+        elif codec_type == "audio":
+            audio_tracks.append(
+                AudioTrack(
+                    index=stream.get("index", len(audio_tracks)),
+                    codec=stream.get("codec_name"),
+                    channels=stream.get("channels"),
+                    channel_layout=stream.get("channel_layout"),
+                    language=stream.get("tags", {}).get("language"),
+                    is_default=stream.get("disposition", {}).get("default", 0) == 1,
+                )
+            )
+
+        elif codec_type == "subtitle":
+            subtitle_tracks.append(
+                SubtitleTrack(
+                    index=stream.get("index", len(subtitle_tracks)),
+                    codec=stream.get("codec_name"),
+                    language=stream.get("tags", {}).get("language"),
+                    is_default=stream.get("disposition", {}).get("default", 0) == 1,
+                    is_forced=stream.get("disposition", {}).get("forced", 0) == 1,
+                )
+            )
+
+    return MediaInfo(
+        video_tracks=tuple(video_tracks),
+        audio_tracks=tuple(audio_tracks),
+        subtitle_tracks=tuple(subtitle_tracks),
+        duration_seconds=duration_seconds,
+        bitrate_kbps=bitrate_kbps,
+    )
@@ -13,7 +13,7 @@ from datetime import UTC, datetime
 from pathlib import Path
 from typing import Any

-from alfred.domain.subtitles.entities import SubtitleCandidate
+from alfred.domain.subtitles.entities import SubtitleScanResult
 from alfred.application.subtitles.placer import PlacedTrack
 from alfred.infrastructure.metadata.store import MetadataStore

@@ -25,7 +25,7 @@ class SubtitleMetadataStore:
    Subtitle-pipeline view of the per-release `.alfred/metadata.yaml`.

    Backed by a generic MetadataStore; this class only knows how to build
-    a subtitle_history entry from PlacedTrack/SubtitleCandidate pairs.
+    a subtitle_history entry from PlacedTrack/SubtitleScanResult pairs.
    """

    def __init__(self, library_root: Path):
@@ -45,7 +45,7 @@ class SubtitleMetadataStore:

    def append_history(
        self,
-        placed_pairs: list[tuple[PlacedTrack, SubtitleCandidate]],
+        placed_pairs: list[tuple[PlacedTrack, SubtitleScanResult]],
        season: int | None = None,
        episode: int | None = None,
        release_group: str | None = None,
@@ -7,7 +7,7 @@ from typing import TYPE_CHECKING
 import yaml

 from alfred.domain.subtitles.aggregates import SubtitleRuleSet
-from alfred.domain.subtitles.value_objects import RuleScope
+from alfred.domain.subtitles.value_objects import RuleScope, RuleScopeLevel

 if TYPE_CHECKING:
    from alfred.infrastructure.persistence.memory.ltm.components.subtitle_preferences import (
@@ -72,7 +72,9 @@ class RuleSetRepository:
            rg_data = _load_yaml(rg_path).get("override", {})
            if rg_data:
                rg_ruleset = SubtitleRuleSet(
-                    scope=RuleScope(level="release_group", identifier=release_group),
+                    scope=RuleScope(
+                        level=RuleScopeLevel.RELEASE_GROUP, identifier=release_group
+                    ),
                    parent=current,
                )
                rg_ruleset.override(**_filter_override(rg_data))
@@ -85,7 +87,7 @@ class RuleSetRepository:
        local_data = _load_yaml(self._alfred_dir / "rules.yaml").get("override", {})
        if local_data:
            local_ruleset = SubtitleRuleSet(
-                scope=RuleScope(level="show"),
+                scope=RuleScope(level=RuleScopeLevel.SHOW),
                parent=current,
            )
            local_ruleset.override(**_filter_override(local_data))
@@ -0,0 +1,17 @@
+# Known streaming distributor tokens (case-insensitive match).
+#
+# These tags identify *which platform* the release was sourced from
+# (Netflix, Amazon, Disney+, …). Distinct from ``sources.yaml`` which
+# captures the encoding origin (WEB-DL, BluRay, …). A typical release
+# carries both: ``Show.S01E01.1080p.NF.WEB-DL.x264-GROUP`` →
+# source=WEB-DL, distributor=NF.
+distributors:
+  - NF      # Netflix
+  - AMZN    # Amazon Prime Video
+  - DSNP    # Disney+
+  - HMAX    # HBO Max
+  - ATVP    # Apple TV+
+  - HULU    # Hulu
+  - PCOK    # Peacock
+  - PMTP    # Paramount+
+  - CR      # Crunchyroll
@@ -0,0 +1,45 @@
+# Translation table — ffprobe output → scene-style release tokens.
+#
+# Consumed by ``alfred.application.release.enrich_from_probe`` when filling
+# missing ParsedRelease fields from a probed MediaInfo. Token-level values
+# from the release name always win; these mappings only fire when the
+# corresponding ParsedRelease field is None.
+#
+# Lookup is case-insensitive on the key side (ffprobe sometimes emits
+# uppercase, sometimes lowercase). When no key matches, the fallback is
+# ``ffprobe_value.upper()`` so unknown codecs still surface in a
+# predictable form (and signal the gap to a future "learn" pass).
+#
+# Each section is a flat dict — values are the canonical scene tokens
+# Alfred uses everywhere (filename builders, ParsedRelease fields).
+
+# ffprobe video codec name → scene codec token
+video_codec:
+  hevc: x265
+  h264: x264
+  h265: x265
+  av1: AV1
+  vp9: VP9
+  mpeg4: XviD
+
+# ffprobe audio codec name → scene audio token
+audio_codec:
+  eac3: EAC3
+  ac3: AC3
+  dts: DTS
+  truehd: TrueHD
+  aac: AAC
+  flac: FLAC
+  opus: OPUS
+  mp3: MP3
+  pcm_s16l: PCM
+  pcm_s24l: PCM
+
+# Channel count (integer) → standard layout string.
+# Keys are strings here because YAML mappings prefer string keys; the
+# loader normalizes them back to int.
+audio_channels:
+  "8": "7.1"
+  "6": "5.1"
+  "2": "2.0"
+  "1": "1.0"
@@ -0,0 +1,22 @@
+# ELiTE release naming schema.
+#
+# Examples seen in the wild:
+#   Foundation.S02.1080p.x265-ELiTE             (TV season pack, no source)
+#
+# ELiTE often omits the source token entirely on TV releases (no WEBRip /
+# BluRay), going straight from resolution to codec.
+
+name: ELiTE
+separator: "."
+
+chunk_order:
+  - role: title
+  - role: year
+    optional: true
+  - role: season_episode
+    optional: true
+  - role: resolution
+  - role: source
+    optional: true             # often absent on TV
+  - role: codec
+  - role: group
@@ -0,0 +1,28 @@
+# KONTRAST release naming schema.
+#
+# Examples seen in the wild:
+#   Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST            (movie)
+#   The.Long.Walk.2025.1080p.WEBRip.x265-KONTRAST             (movie)
+#   Slow.Horses.S05E01.1080p.WEBRip.x265-KONTRAST             (TV episode)
+#   Slow.Horses.S05.1080p.WEBRip.x265-KONTRAST                (TV season pack)
+#
+# Schema is a left-to-right description of the canonical chunk order.
+# Each entry is a role (matching TokenRole). Optional chunks are marked
+# with `optional: true`. The parser consumes tokens greedily by role,
+# skipping over optional chunks that don't match.
+
+name: KONTRAST
+separator: "."
+
+# Canonical order of structural + technical chunks (left to right).
+# `title` is special-cased as "everything up to the first non-title role".
+chunk_order:
+  - role: title
+  - role: year
+    optional: true             # absent on TV releases (S01E01 instead)
+  - role: season_episode
+    optional: true             # absent on movies
+  - role: resolution           # always present (1080p, 2160p, …)
+  - role: source               # always present (WEBRip, BluRay, …)
+  - role: codec                # always present (x265, x264, …)
+  - role: group                # everything after the final `-`
@@ -0,0 +1,20 @@
+# RARBG release naming schema.
+#
+# RARBG follows the canonical scene convention closely:
+#   Title.Year.Resolution.Source.Codec-RARBG
+# For TV:
+#   Title.S01E01.Resolution.Source.Codec-RARBG
+
+name: RARBG
+separator: "."
+
+chunk_order:
+  - role: title
+  - role: year
+    optional: true
+  - role: season_episode
+    optional: true
+  - role: resolution
+  - role: source
+  - role: codec
+  - role: group
@@ -0,0 +1,42 @@
+# Release parse scoring.
+#
+# `parse_release` returns a `ParseReport` alongside the `ParsedRelease`.
+# The report carries a 0-100 confidence score computed from the annotated
+# tokens, plus the road decision (EASY / SHITTY / PATH_OF_PAIN).
+#
+# Why YAML: the weights and the SHITTY/PoP cutoff are tuning knobs we
+# expect to iterate on as fixtures grow. Keeping them in code would
+# mean a commit per tweak; here the user can adjust without touching
+# Python.
+#
+# Weights are awarded when the corresponding ParsedRelease field is
+# populated (non-None, non-"UNKNOWN" for group). Season and episode
+# only contribute when the parse looks like TV (season is not None).
+
+weights:
+  title:       30   # structural pivot — without it nothing else matters
+  media_type:  20   # movie / tv_show / tv_complete / …
+  year:        15
+  season:      10   # only counted for TV-shaped releases
+  episode:     5
+  resolution:  5
+  source:      5
+  codec:       5
+  group:       5    # "UNKNOWN" yields 0
+
+# Penalty applied per UNKNOWN token left in the annotated stream.
+# Capped at `max_unknown_penalty` to keep a long-tail of garbage from
+# pushing every release into PoP.
+penalties:
+  unknown_token:        5
+  max_unknown_penalty:  30
+
+# Decision thresholds.
+#
+# EASY is decided structurally (a known group schema matched) — it does
+# not look at the score. SHITTY vs PATH_OF_PAIN is decided here:
+#
+#   score >= shitty_min  → SHITTY (best-effort parse usable)
+#   score <  shitty_min  → PATH_OF_PAIN (needs user / LLM help)
+thresholds:
+  shitty_min: 60
@@ -21,3 +21,4 @@ separators:
  - "("   # parenthesis-embedded (year, edition): (2020) (Director's Cut)
  - ")"
  - "_"   # underscore-as-space (old usenet, some Asian releases)
+  - "｜"  # fullwidth vertical bar U+FF5C (CJK release names, occasional decorative use)
@@ -1,4 +1,9 @@
-# Known release source tokens (case-insensitive match)
+# Known release source tokens (case-insensitive match).
+#
+# "Source" here means the capture/encoding origin (disc, broadcast, web
+# stream) — NOT the streaming distributor (Netflix, Disney+, …). Those
+# live in ``distributors.yaml`` because they're a separate dimension:
+# a release is typically "WEB-DL from NF" — both should be captured.
 sources:
  - bluray
  - blu-ray
@@ -14,8 +19,3 @@ sources:
  - dvdrip
  - dvd
  - vodrip
-  - amzn
-  - nf
-  - dsnp
-  - hmax
-  - atvp
@@ -37,12 +37,6 @@ class Settings(BaseSettings):
    llm_temperature: float = 0.2
    data_storage_dir: str = "data"

-    # --- MEDIA ---
-    # Minimum file size to consider a video file as a real movie (in bytes).
-    # 100 MB is generous enough to skip sample clips / trailers without rejecting
-    # legitimate low-bitrate releases (e.g. older anime, certain web rips).
-    min_movie_size_bytes: int = 100 * 1024 * 1024
-
    # --- BUILD ---
    alfred_version: str | None = None

@@ -90,15 +84,6 @@ class Settings(BaseSettings):
            )
        return v

-    @field_validator("min_movie_size_bytes")
-    @classmethod
-    def validate_min_movie_size(cls, v: int) -> int:
-        if v < 0:
-            raise ConfigurationError(
-                f"min_movie_size_bytes must be non-negative, got {v}"
-            )
-        return v
-
    @field_validator("request_timeout")
    @classmethod
    def validate_timeout(cls, v: int) -> int:
@@ -88,13 +88,13 @@ def analyze(release_name: str, source_path: str | None = None) -> None:
        if not path.exists():
            print("  (chemin inexistant, probe skipped)")
        else:
-            from alfred.infrastructure.filesystem.ffprobe import probe
            from alfred.infrastructure.filesystem.find_video import find_video_file
+            from alfred.infrastructure.probe import FfprobeMediaProber

            video = find_video_file(path) if path.is_dir() else path
            if video:
                print(f"  video file: {video.name}")
-                info = probe(video)
+                info = FfprobeMediaProber().probe(video)
                if info:
                    print(f"  codec: {info.video_codec}")
                    print(f"  resolution: {info.resolution}")
@@ -124,8 +124,16 @@ def dry_run(release_name: str) -> None:
    from alfred.application.filesystem.resolve_destination import (
        resolve_season_destination,
    )
+    from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
+    from alfred.infrastructure.probe import FfprobeMediaProber

-    result = resolve_season_destination(release_name, tmdb_title, tmdb_year)
+    result = resolve_season_destination(
+        release_name,
+        tmdb_title,
+        tmdb_year,
+        YamlReleaseKnowledge(),
+        FfprobeMediaProber(),
+    )
    d = result.to_dict()
    print()
    print(json.dumps(d, indent=2, ensure_ascii=False))
@@ -203,8 +211,16 @@ def do_move(release_name: str, source_folder: str | None = None) -> None:
    from alfred.application.filesystem.resolve_destination import (
        resolve_season_destination,
    )
+    from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
+    from alfred.infrastructure.probe import FfprobeMediaProber

-    result = resolve_season_destination(release_name, tmdb_title, tmdb_year)
+    result = resolve_season_destination(
+        release_name,
+        tmdb_title,
+        tmdb_year,
+        YamlReleaseKnowledge(),
+        FfprobeMediaProber(),
+    )
    d = result.to_dict()

    if d["status"] == "needs_clarification":
@@ -98,9 +98,9 @@ def main() -> None:
        print(c(f"Error: {path} does not exist", RED), file=sys.stderr)
        sys.exit(1)

-    from alfred.infrastructure.filesystem.ffprobe import probe
+    from alfred.infrastructure.probe import FfprobeMediaProber

-    info = probe(path)
+    info = FfprobeMediaProber().probe(path)
    if info is None:
        print(c("Error: ffprobe failed to probe the file", RED), file=sys.stderr)
        sys.exit(1)
@@ -100,11 +100,18 @@ def main() -> None:
        print(c(f"Error: {downloads} does not exist", RED), file=sys.stderr)
        sys.exit(1)

-    from alfred.application.filesystem.detect_media_type import detect_media_type
-    from alfred.application.filesystem.enrich_from_probe import enrich_from_probe
+    from dataclasses import replace
+
+    from alfred.application.release.detect_media_type import detect_media_type
+    from alfred.application.release.enrich_from_probe import enrich_from_probe
    from alfred.domain.release.services import parse_release
-    from alfred.infrastructure.filesystem.ffprobe import probe
+    from alfred.domain.release.value_objects import MediaTypeToken
    from alfred.infrastructure.filesystem.find_video import find_video_file
+    from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
+    from alfred.infrastructure.probe import FfprobeMediaProber
+
+    _kb = YamlReleaseKnowledge()
+    _prober = FfprobeMediaProber()

    entries = sorted(downloads.iterdir(), key=lambda p: p.name.lower())
    total = len(entries)
@@ -121,14 +128,14 @@ def main() -> None:
        name = entry.name

        try:
-            p = parse_release(name)
-            p.media_type = detect_media_type(p, entry)
+            p, _report = parse_release(name, _kb)
+            p = replace(p, media_type=MediaTypeToken(detect_media_type(p, entry, _kb)))
            if p.media_type not in ("unknown", "other"):
                video_file = find_video_file(entry)
                if video_file:
-                    media_info = probe(video_file)
+                    media_info = _prober.probe(video_file)
                    if media_info:
-                        enrich_from_probe(p, media_info)
+                        p = enrich_from_probe(p, media_info, _kb)
            warnings = _assess(p)
        except Exception as e:
            warnings = [f"parse error: {e}"]
@@ -1,4 +1,4 @@
-"""Tests for ``alfred.application.filesystem.detect_media_type``.
+"""Tests for ``alfred.application.release.detect_media_type``.

 The function refines a ``ParsedRelease.media_type`` using filesystem evidence.

@@ -18,18 +18,24 @@ from pathlib import Path

 import pytest

-from alfred.application.filesystem.detect_media_type import detect_media_type
+from alfred.application.release.detect_media_type import detect_media_type
 from alfred.domain.release.services import parse_release
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
+
+_KB = YamlReleaseKnowledge()


 def _parsed(media_type: str = "movie"):
    """Build a ParsedRelease with the requested media_type via the real parser."""
    if media_type == "tv_show":
-        return parse_release("Show.S01E01.1080p-GRP")
+        parsed, _ = parse_release("Show.S01E01.1080p-GRP", _KB)
+        return parsed
    if media_type == "movie":
-        return parse_release("Movie.2020.1080p-GRP")
+        parsed, _ = parse_release("Movie.2020.1080p-GRP", _KB)
+        return parsed
    # "unknown" / other — feed a name the parser can't classify
-    return parse_release("randomthing")
+    parsed, _ = parse_release("randomthing", _KB)
+    return parsed


 # --------------------------------------------------------------------------- #
@@ -41,30 +47,30 @@ class TestFile:
    def test_video_file_preserves_parsed_type(self, tmp_path: Path):
        f = tmp_path / "x.mkv"
        f.write_bytes(b"")
-        assert detect_media_type(_parsed("movie"), f) == "movie"
+        assert detect_media_type(_parsed("movie"), f, _KB) == "movie"

    def test_video_file_preserves_tv_type(self, tmp_path: Path):
        f = tmp_path / "ep.mp4"
        f.write_bytes(b"")
-        assert detect_media_type(_parsed("tv_show"), f) == "tv_show"
+        assert detect_media_type(_parsed("tv_show"), f, _KB) == "tv_show"

    def test_non_video_file_returns_other(self, tmp_path: Path):
        f = tmp_path / "x.iso"
        f.write_bytes(b"")
-        assert detect_media_type(_parsed("movie"), f) == "other"
+        assert detect_media_type(_parsed("movie"), f, _KB) == "other"

    @pytest.mark.parametrize("ext", [".rar", ".zip", ".7z", ".exe", ".dmg"])
    def test_various_non_video_extensions(self, tmp_path: Path, ext):
        f = tmp_path / f"x{ext}"
        f.write_bytes(b"")
-        assert detect_media_type(_parsed("movie"), f) == "other"
+        assert detect_media_type(_parsed("movie"), f, _KB) == "other"

    def test_metadata_only_file_keeps_parsed_type(self, tmp_path: Path):
        # Metadata extension is stripped from conclusive set — no video, no
        # non-video → falls through to parsed.media_type.
        f = tmp_path / "x.nfo"
        f.write_bytes(b"")
-        assert detect_media_type(_parsed("movie"), f) == "movie"
+        assert detect_media_type(_parsed("movie"), f, _KB) == "movie"


 # --------------------------------------------------------------------------- #
@@ -75,27 +81,27 @@ class TestFile:
 class TestFolder:
    def test_folder_with_video_keeps_parsed_type(self, tmp_path: Path):
        (tmp_path / "main.mkv").write_bytes(b"")
-        assert detect_media_type(_parsed("movie"), tmp_path) == "movie"
+        assert detect_media_type(_parsed("movie"), tmp_path, _KB) == "movie"

    def test_folder_only_non_video_returns_other(self, tmp_path: Path):
        (tmp_path / "disc.iso").write_bytes(b"")
        (tmp_path / "part.rar").write_bytes(b"")
-        assert detect_media_type(_parsed("movie"), tmp_path) == "other"
+        assert detect_media_type(_parsed("movie"), tmp_path, _KB) == "other"

    def test_folder_mixed_returns_unknown(self, tmp_path: Path):
        (tmp_path / "main.mkv").write_bytes(b"")
        (tmp_path / "extras.iso").write_bytes(b"")
-        assert detect_media_type(_parsed("movie"), tmp_path) == "unknown"
+        assert detect_media_type(_parsed("movie"), tmp_path, _KB) == "unknown"

    def test_empty_folder_keeps_parsed_type(self, tmp_path: Path):
-        assert detect_media_type(_parsed("tv_show"), tmp_path) == "tv_show"
+        assert detect_media_type(_parsed("tv_show"), tmp_path, _KB) == "tv_show"

    def test_folder_only_metadata_keeps_parsed_type(self, tmp_path: Path):
        (tmp_path / "info.nfo").write_bytes(b"")
        (tmp_path / "cover.jpg").write_bytes(b"")
        (tmp_path / "subs.srt").write_bytes(b"")
        # All metadata → conclusive set empty → falls through.
-        assert detect_media_type(_parsed("movie"), tmp_path) == "movie"
+        assert detect_media_type(_parsed("movie"), tmp_path, _KB) == "movie"


 # --------------------------------------------------------------------------- #
@@ -109,18 +115,18 @@ class TestMetadataIgnored:
        (tmp_path / "info.nfo").write_bytes(b"")
        (tmp_path / "cover.jpg").write_bytes(b"")
        (tmp_path / "subs.srt").write_bytes(b"")
-        assert detect_media_type(_parsed("movie"), tmp_path) == "movie"
+        assert detect_media_type(_parsed("movie"), tmp_path, _KB) == "movie"

    def test_non_video_plus_metadata_still_other(self, tmp_path: Path):
        (tmp_path / "disc.iso").write_bytes(b"")
        (tmp_path / "info.nfo").write_bytes(b"")
-        assert detect_media_type(_parsed("movie"), tmp_path) == "other"
+        assert detect_media_type(_parsed("movie"), tmp_path, _KB) == "other"

    def test_case_insensitive_extensions(self, tmp_path: Path):
        # Suffix is lowercased before classification.
        f = tmp_path / "X.MKV"
        f.write_bytes(b"")
-        assert detect_media_type(_parsed("movie"), f) == "movie"
+        assert detect_media_type(_parsed("movie"), f, _KB) == "movie"


 # --------------------------------------------------------------------------- #
@@ -132,11 +138,11 @@ class TestMissing:
    def test_nonexistent_path_keeps_parsed_type(self, tmp_path: Path):
        missing = tmp_path / "does_not_exist.mkv"
        # Doesn't exist → empty extension set → falls through.
-        assert detect_media_type(_parsed("movie"), missing) == "movie"
+        assert detect_media_type(_parsed("movie"), missing, _KB) == "movie"

    def test_nonexistent_folder_keeps_parsed_type(self, tmp_path: Path):
        missing = tmp_path / "ghost"
-        assert detect_media_type(_parsed("tv_show"), missing) == "tv_show"
+        assert detect_media_type(_parsed("tv_show"), missing, _KB) == "tv_show"

    def test_subfolder_not_recursed(self, tmp_path: Path):
        # _collect_extensions scans only the first level — files inside
@@ -145,4 +151,4 @@ class TestMissing:
        sub.mkdir()
        (sub / "deep.mkv").write_bytes(b"")
        # Top level has no files at all → empty → falls through to parsed type.
-        assert detect_media_type(_parsed("movie"), tmp_path) == "movie"
+        assert detect_media_type(_parsed("movie"), tmp_path, _KB) == "movie"
@@ -1,8 +1,8 @@
-"""Tests for ``alfred.application.filesystem.enrich_from_probe``.
+"""Tests for ``alfred.application.release.enrich_from_probe``.

-The function mutates a ``ParsedRelease`` in place using ffprobe ``MediaInfo``.
-Token-level values from the release name always win — only ``None`` fields
-are filled.
+The function returns a new ``ParsedRelease`` with ``None`` fields filled
+from ffprobe ``MediaInfo``. Token-level values from the release name
+always win — only ``None`` fields are filled.

 Coverage:

@@ -18,9 +18,12 @@ Uses real ``ParsedRelease`` / ``MediaInfo`` instances — no mocking needed.

 from __future__ import annotations

-from alfred.application.filesystem.enrich_from_probe import enrich_from_probe
+from alfred.application.release.enrich_from_probe import enrich_from_probe
 from alfred.domain.release.value_objects import ParsedRelease
 from alfred.domain.shared.media import AudioTrack, MediaInfo, VideoTrack
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
+
+_KB = YamlReleaseKnowledge()


 def _info_with_video(*, width=None, height=None, codec=None, **rest) -> MediaInfo:
@@ -35,8 +38,9 @@ def _bare(**overrides) -> ParsedRelease:
    """Build a minimal ParsedRelease with all enrichable fields = None."""
    defaults = dict(
        raw="X",
-        normalised="X",
+        clean="X",
        title="X",
+        title_sanitized="X",
        year=None,
        season=None,
        episode=None,
@@ -45,7 +49,6 @@ def _bare(**overrides) -> ParsedRelease:
        source=None,
        codec=None,
        group="UNKNOWN",
-        tech_string="",
    )
    defaults.update(overrides)
    return ParsedRelease(**defaults)
@@ -59,17 +62,17 @@ def _bare(**overrides) -> ParsedRelease:
 class TestQuality:
    def test_fills_when_none(self):
        p = _bare()
-        enrich_from_probe(p, _info_with_video(width=1920, height=1080))
+        p = enrich_from_probe(p, _info_with_video(width=1920, height=1080), _KB)
        assert p.quality == "1080p"

    def test_does_not_overwrite_existing(self):
        p = _bare(quality="2160p")
-        enrich_from_probe(p, _info_with_video(width=1920, height=1080))
+        p = enrich_from_probe(p, _info_with_video(width=1920, height=1080), _KB)
        assert p.quality == "2160p"

    def test_no_dims_leaves_none(self):
        p = _bare()
-        enrich_from_probe(p, MediaInfo())
+        p = enrich_from_probe(p, MediaInfo(), _KB)
        assert p.quality is None


@@ -81,27 +84,27 @@ class TestQuality:
 class TestVideoCodec:
    def test_hevc_to_x265(self):
        p = _bare()
-        enrich_from_probe(p, _info_with_video(codec="hevc"))
+        p = enrich_from_probe(p, _info_with_video(codec="hevc"), _KB)
        assert p.codec == "x265"

    def test_h264_to_x264(self):
        p = _bare()
-        enrich_from_probe(p, _info_with_video(codec="h264"))
+        p = enrich_from_probe(p, _info_with_video(codec="h264"), _KB)
        assert p.codec == "x264"

    def test_unknown_codec_uppercased(self):
        p = _bare()
-        enrich_from_probe(p, _info_with_video(codec="weird"))
+        p = enrich_from_probe(p, _info_with_video(codec="weird"), _KB)
        assert p.codec == "WEIRD"

    def test_does_not_overwrite_existing(self):
        p = _bare(codec="HEVC")
-        enrich_from_probe(p, _info_with_video(codec="h264"))
+        p = enrich_from_probe(p, _info_with_video(codec="h264"), _KB)
        assert p.codec == "HEVC"

    def test_no_codec_leaves_none(self):
        p = _bare()
-        enrich_from_probe(p, MediaInfo())
+        p = enrich_from_probe(p, MediaInfo(), _KB)
        assert p.codec is None


@@ -119,7 +122,7 @@ class TestAudio:
            ]
        )
        p = _bare()
-        enrich_from_probe(p, info)
+        p = enrich_from_probe(p, info, _KB)
        assert p.audio_codec == "EAC3"
        assert p.audio_channels == "5.1"

@@ -131,32 +134,32 @@ class TestAudio:
            ]
        )
        p = _bare()
-        enrich_from_probe(p, info)
+        p = enrich_from_probe(p, info, _KB)
        assert p.audio_codec == "AC3"
        assert p.audio_channels == "5.1"

    def test_channel_count_unknown_falls_back(self):
        info = MediaInfo(audio_tracks=[AudioTrack(0, "aac", 4, "quad", "eng")])
        p = _bare()
-        enrich_from_probe(p, info)
+        p = enrich_from_probe(p, info, _KB)
        assert p.audio_channels == "4ch"

    def test_unknown_audio_codec_uppercased(self):
        info = MediaInfo(audio_tracks=[AudioTrack(0, "newcodec", 2, "stereo", "eng")])
        p = _bare()
-        enrich_from_probe(p, info)
+        p = enrich_from_probe(p, info, _KB)
        assert p.audio_codec == "NEWCODEC"

    def test_no_audio_tracks(self):
        p = _bare()
-        enrich_from_probe(p, MediaInfo())
+        p = enrich_from_probe(p, MediaInfo(), _KB)
        assert p.audio_codec is None
        assert p.audio_channels is None

    def test_does_not_overwrite_existing_audio_fields(self):
        info = MediaInfo(audio_tracks=[AudioTrack(0, "ac3", 6, "5.1", "eng")])
        p = _bare(audio_codec="DTS-HD.MA", audio_channels="7.1")
-        enrich_from_probe(p, info)
+        p = enrich_from_probe(p, info, _KB)
        assert p.audio_codec == "DTS-HD.MA"
        assert p.audio_channels == "7.1"

@@ -175,8 +178,8 @@ class TestLanguages:
            ]
        )
        p = _bare()
-        enrich_from_probe(p, info)
-        assert p.languages == ["eng", "fre"]
+        p = enrich_from_probe(p, info, _KB)
+        assert p.languages == ("eng", "fre")

    def test_skips_und(self):
        info = MediaInfo(
@@ -186,8 +189,8 @@ class TestLanguages:
            ]
        )
        p = _bare()
-        enrich_from_probe(p, info)
-        assert p.languages == ["eng"]
+        p = enrich_from_probe(p, info, _KB)
+        assert p.languages == ("eng",)

    def test_dedup_against_existing_case_insensitive(self):
        # existing token-level languages are typically upper-case ("FRENCH", "ENG")
@@ -199,13 +202,52 @@ class TestLanguages:
                AudioTrack(1, "aac", 2, "stereo", "fre"),
            ]
        )
-        p = _bare()
-        p.languages = ["ENG"]
-        enrich_from_probe(p, info)
+        p = _bare(languages=("ENG",))
+        p = enrich_from_probe(p, info, _KB)
        # "eng" → upper "ENG" already present → skipped. "fre" → "FRE" new → kept.
-        assert p.languages == ["ENG", "fre"]
+        assert p.languages == ("ENG", "fre")

    def test_no_audio_tracks_leaves_languages_empty(self):
        p = _bare()
-        enrich_from_probe(p, MediaInfo())
-        assert p.languages == []
+        p = enrich_from_probe(p, MediaInfo(), _KB)
+        assert p.languages == ()
+
+
+# --------------------------------------------------------------------------- #
+# tech_string                                                                  #
+# --------------------------------------------------------------------------- #
+
+
+class TestTechString:
+    """tech_string is a derived property on ParsedRelease: it always
+    reflects the current quality/source/codec. Enrichment never writes
+    it directly — it stays in sync by construction."""
+
+    def test_rebuilt_from_filled_quality_and_codec(self):
+        p = _bare()
+        p = enrich_from_probe(
+            p, _info_with_video(width=1920, height=1080, codec="hevc"), _KB
+        )
+        assert p.quality == "1080p"
+        assert p.codec == "x265"
+        assert p.tech_string == "1080p.x265"
+
+    def test_keeps_existing_source_when_enriching(self):
+        # Token-level source must stay; probe fills only None fields.
+        p = _bare(source="BluRay")
+        p = enrich_from_probe(
+            p, _info_with_video(width=1920, height=1080, codec="hevc"), _KB
+        )
+        assert p.tech_string == "1080p.BluRay.x265"
+
+    def test_unchanged_when_no_enrichable_video_info(self):
+        # No video info → nothing to fill → derived tech_string stays as it was.
+        p = _bare(quality="2160p", source="WEB-DL", codec="x265")
+        assert p.tech_string == "2160p.WEB-DL.x265"
+        p = enrich_from_probe(p, MediaInfo(), _KB)
+        assert p.tech_string == "2160p.WEB-DL.x265"
+
+    def test_empty_when_nothing_known(self):
+        p = _bare()
+        p = enrich_from_probe(p, MediaInfo(), _KB)
+        assert p.tech_string == ""
@@ -0,0 +1,356 @@
+"""Tests for the ``inspect_release`` orchestrator (Phase C).
+
+Covers the four composition steps as a black box: a real
+``YamlReleaseKnowledge``, real on-disk filesystem under ``tmp_path``,
+and a stubbed ``MediaProber`` so we don't depend on a system ``ffprobe``.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from alfred.application.release import InspectedResult, inspect_release
+from alfred.domain.shared.media import AudioTrack, MediaInfo, VideoTrack
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
+
+_KB = YamlReleaseKnowledge()
+
+_MOVIE_NAME = "Inception.2010.1080p.BluRay.x264-GROUP"
+_TV_NAME = "Dexter.S01E01.1080p.WEB-DL.x264-GROUP"
+
+
+# --------------------------------------------------------------------------- #
+# Test doubles                                                                 #
+# --------------------------------------------------------------------------- #
+
+
+class _StubProber:
+    """Minimal MediaProber stub. Records the path it was asked to probe."""
+
+    def __init__(self, info: MediaInfo | None) -> None:
+        self._info = info
+        self.calls: list[Path] = []
+
+    def list_subtitle_streams(self, video: Path):  # pragma: no cover - unused here
+        return []
+
+    def probe(self, video: Path) -> MediaInfo | None:
+        self.calls.append(video)
+        return self._info
+
+
+class _RaisingProber:
+    """A prober that would explode if called — used to assert no probe."""
+
+    def list_subtitle_streams(self, video: Path):  # pragma: no cover
+        raise AssertionError("list_subtitle_streams must not be called")
+
+    def probe(self, video: Path):  # pragma: no cover
+        raise AssertionError("probe must not be called")
+
+
+def _media_info_1080p_h264() -> MediaInfo:
+    return MediaInfo(
+        video_tracks=(VideoTrack(index=0, codec="h264", width=1920, height=1080),),
+        audio_tracks=(
+            AudioTrack(
+                index=1,
+                codec="ac3",
+                channels=6,
+                channel_layout="5.1",
+                language="eng",
+                is_default=True,
+            ),
+        ),
+        subtitle_tracks=(),
+        duration_seconds=7200.0,
+        bitrate_kbps=8000,
+    )
+
+
+# --------------------------------------------------------------------------- #
+# Happy paths                                                                  #
+# --------------------------------------------------------------------------- #
+
+
+class TestInspectMovieFolder:
+    def test_returns_inspected_result_with_all_fields(self, tmp_path: Path) -> None:
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        video = folder / "movie.mkv"
+        video.write_bytes(b"")
+        prober = _StubProber(_media_info_1080p_h264())
+
+        result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
+
+        assert isinstance(result, InspectedResult)
+        assert result.source_path == folder
+        assert result.main_video == video
+        assert result.media_info is not None
+        assert result.probe_used is True
+        assert prober.calls == [video]
+
+    def test_parsed_carries_token_level_fields(self, tmp_path: Path) -> None:
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        (folder / "movie.mkv").write_bytes(b"")
+        prober = _StubProber(_media_info_1080p_h264())
+
+        result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
+
+        assert result.parsed.title.lower().startswith("inception")
+        assert result.parsed.year == 2010
+        assert result.parsed.group == "GROUP"
+        assert result.parsed.media_type == "movie"
+
+    def test_report_has_confidence_and_road(self, tmp_path: Path) -> None:
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        (folder / "movie.mkv").write_bytes(b"")
+        prober = _StubProber(None)
+
+        result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
+
+        assert 0 <= result.report.confidence <= 100
+        assert result.report.road in ("easy", "shitty", "path_of_pain")
+
+
+class TestInspectSingleFile:
+    def test_file_is_its_own_main_video(self, tmp_path: Path) -> None:
+        f = tmp_path / f"{_MOVIE_NAME}.mkv"
+        f.write_bytes(b"")
+        prober = _StubProber(_media_info_1080p_h264())
+
+        result = inspect_release(_MOVIE_NAME, f, _KB, prober)
+
+        assert result.main_video == f
+        assert result.probe_used is True
+
+
+# --------------------------------------------------------------------------- #
+# Probe-gating logic                                                           #
+# --------------------------------------------------------------------------- #
+
+
+class TestProbeGating:
+    def test_no_video_means_no_probe(self, tmp_path: Path) -> None:
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        # Only a non-video file present.
+        (folder / "readme.txt").write_text("hi")
+        prober = _RaisingProber()
+
+        result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
+
+        assert result.main_video is None
+        assert result.media_info is None
+        assert result.probe_used is False
+
+    def test_media_type_other_means_no_probe(self, tmp_path: Path) -> None:
+        # An ISO-only folder gets detect_media_type → "other".
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        (folder / "disc.iso").write_bytes(b"")
+        prober = _RaisingProber()
+
+        result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
+
+        assert result.parsed.media_type == "other"
+        assert result.media_info is None
+        assert result.probe_used is False
+
+    def test_probe_failure_keeps_probe_used_false(self, tmp_path: Path) -> None:
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        (folder / "movie.mkv").write_bytes(b"")
+        prober = _StubProber(None)  # ffprobe simulated as failing
+
+        result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
+
+        assert result.main_video is not None
+        assert result.media_info is None
+        assert result.probe_used is False
+
+
+# --------------------------------------------------------------------------- #
+# Mutation contract                                                            #
+# --------------------------------------------------------------------------- #
+
+
+class TestMutationContract:
+    def test_detect_media_type_refines_parsed(self, tmp_path: Path) -> None:
+        # Release name parses to "movie", but folder mixes video + non_video
+        # (e.g. an ISO sitting next to an mkv) → detect_media_type returns
+        # "unknown", which is in _NON_PROBABLE_MEDIA_TYPES → no probe.
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        (folder / "movie.mkv").write_bytes(b"")
+        (folder / "extras.iso").write_bytes(b"")
+        prober = _RaisingProber()
+
+        result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
+
+        assert result.parsed.media_type == "unknown"
+        assert result.probe_used is False
+
+    def test_enrich_runs_when_probe_succeeds(self, tmp_path: Path) -> None:
+        # Build a release name with no codec; probe should fill it in.
+        name = "Inception.2010.1080p.BluRay-GROUP"
+        folder = tmp_path / name
+        folder.mkdir()
+        (folder / "movie.mkv").write_bytes(b"")
+        prober = _StubProber(_media_info_1080p_h264())
+
+        result = inspect_release(name, folder, _KB, prober)
+
+        assert result.probe_used is True
+        # enrich_from_probe should have filled the missing codec field.
+        assert result.parsed.codec is not None
+
+
+# --------------------------------------------------------------------------- #
+# Resilience                                                                   #
+# --------------------------------------------------------------------------- #
+
+
+class TestResilience:
+    def test_nonexistent_path_does_not_raise(self, tmp_path: Path) -> None:
+        ghost = tmp_path / "does-not-exist"
+        prober = _RaisingProber()
+
+        result = inspect_release(_MOVIE_NAME, ghost, _KB, prober)
+
+        assert result.main_video is None
+        assert result.media_info is None
+        assert result.probe_used is False
+
+    def test_tv_release_inspection(self, tmp_path: Path) -> None:
+        folder = tmp_path / _TV_NAME
+        folder.mkdir()
+        video = folder / "episode.mkv"
+        video.write_bytes(b"")
+        prober = _StubProber(_media_info_1080p_h264())
+
+        result = inspect_release(_TV_NAME, folder, _KB, prober)
+
+        assert result.parsed.media_type == "tv_show"
+        assert result.parsed.season == 1
+        assert result.parsed.episode == 1
+        assert result.main_video == video
+        assert result.probe_used is True
+
+
+# --------------------------------------------------------------------------- #
+# Frozen contract                                                              #
+# --------------------------------------------------------------------------- #
+
+
+class TestFrozen:
+    def test_inspected_result_is_frozen(self, tmp_path: Path) -> None:
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        (folder / "movie.mkv").write_bytes(b"")
+        prober = _StubProber(None)
+
+        result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
+
+        # frozen=True → assigning a field raises FrozenInstanceError.
+        import dataclasses
+
+        try:
+            result.probe_used = True  # type: ignore[misc]
+        except dataclasses.FrozenInstanceError:
+            pass
+        else:  # pragma: no cover
+            raise AssertionError("InspectedResult should be frozen")
+
+
+# --------------------------------------------------------------------------- #
+# recommended_action                                                           #
+# --------------------------------------------------------------------------- #
+
+
+class TestRecommendedAction:
+    """``recommended_action`` collapses the orchestrator's go / wait /
+    skip decision into a single property. The check ordering is part
+    of the contract (skip wins over ask_user, ask_user wins over
+    process) — see the property docstring."""
+
+    def test_skip_when_no_main_video(self, tmp_path: Path) -> None:
+        # Folder with no video at all → main_video is None → skip.
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        (folder / "readme.txt").write_text("hi")
+
+        result = inspect_release(_MOVIE_NAME, folder, _KB, _RaisingProber())
+
+        assert result.main_video is None
+        assert result.recommended_action == "skip"
+
+    def test_skip_when_media_type_other(self, tmp_path: Path) -> None:
+        # Folder with only non-video files (ISO) → media_type == "other"
+        # AND main_video is None (find_main_video filters by video ext).
+        # Both branches resolve to "skip"; this asserts the contract holds.
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        (folder / "disc.iso").write_bytes(b"")
+
+        result = inspect_release(_MOVIE_NAME, folder, _KB, _RaisingProber())
+
+        assert result.parsed.media_type == "other"
+        assert result.recommended_action == "skip"
+
+    def test_ask_user_when_media_type_unknown(self, tmp_path: Path) -> None:
+        # Mixed video + non-video → detect_media_type returns "unknown".
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        (folder / "movie.mkv").write_bytes(b"")
+        (folder / "extras.iso").write_bytes(b"")
+
+        result = inspect_release(
+            _MOVIE_NAME, folder, _KB, _StubProber(_media_info_1080p_h264())
+        )
+
+        assert result.parsed.media_type == "unknown"
+        assert result.recommended_action == "ask_user"
+
+    def test_ask_user_when_path_of_pain_road(self, tmp_path: Path) -> None:
+        # Malformed name (forbidden chars) → road == "path_of_pain".
+        name = "garbage@#%name"
+        folder = tmp_path / "release"
+        folder.mkdir()
+        (folder / "movie.mkv").write_bytes(b"")
+
+        result = inspect_release(
+            name, folder, _KB, _StubProber(_media_info_1080p_h264())
+        )
+
+        assert result.report.road == "path_of_pain"
+        # main_video is found but the road still flags uncertainty.
+        assert result.main_video is not None
+        assert result.recommended_action == "ask_user"
+
+    def test_process_for_confident_movie(self, tmp_path: Path) -> None:
+        folder = tmp_path / _MOVIE_NAME
+        folder.mkdir()
+        (folder / "movie.mkv").write_bytes(b"")
+
+        result = inspect_release(
+            _MOVIE_NAME, folder, _KB, _StubProber(_media_info_1080p_h264())
+        )
+
+        assert result.parsed.media_type == "movie"
+        assert result.report.road in ("easy", "shitty")
+        assert result.recommended_action == "process"
+
+    def test_process_for_confident_tv_show(self, tmp_path: Path) -> None:
+        folder = tmp_path / _TV_NAME
+        folder.mkdir()
+        (folder / "episode.mkv").write_bytes(b"")
+
+        result = inspect_release(
+            _TV_NAME, folder, _KB, _StubProber(_media_info_1080p_h264())
+        )
+
+        assert result.parsed.media_type == "tv_show"
+        assert result.recommended_action == "process"
@@ -40,7 +40,7 @@ from alfred.application.filesystem.manage_subtitles import (
    _to_imdb_id,
    _to_unresolved_dto,
 )
-from alfred.domain.subtitles.entities import MediaSubtitleMetadata, SubtitleCandidate
+from alfred.domain.subtitles.entities import MediaSubtitleMetadata, SubtitleScanResult
 from alfred.application.subtitles.placer import PlacedTrack, PlaceResult
 from alfred.domain.subtitles.value_objects import (
    ScanStrategy,
@@ -63,8 +63,8 @@ def _track(
    is_embedded: bool = False,
    raw_tokens: list[str] | None = None,
    file_size_kb: float | None = None,
-) -> SubtitleCandidate:
-    return SubtitleCandidate(
+) -> SubtitleScanResult:
+    return SubtitleScanResult(
        language=lang,
        format=fmt,
        subtitle_type=stype,
@@ -9,7 +9,6 @@ Four use cases compute library paths from a release name + TMDB metadata:

 Coverage:

- ``TestSanitize`` — Windows-forbidden chars stripped.
 - ``TestFindExistingTvshowFolders`` — empty root, prefix match (case + space → dot).
 - ``TestResolveSeriesFolderInternal`` — confirmed_folder, no existing, single match,
  ambiguous → _Clarification.
@@ -32,14 +31,53 @@ from alfred.application.filesystem.resolve_destination import (
    _Clarification,
    _find_existing_tvshow_folders,
    _resolve_series_folder,
-    _sanitize,
-    resolve_episode_destination,
-    resolve_movie_destination,
-    resolve_season_destination,
-    resolve_series_destination,
+    resolve_episode_destination as _resolve_episode_destination,
+    resolve_movie_destination as _resolve_movie_destination,
+    resolve_season_destination as _resolve_season_destination,
+    resolve_series_destination as _resolve_series_destination,
 )
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
 from alfred.infrastructure.persistence import Memory, set_memory

+_KB = YamlReleaseKnowledge()
+
+
+class _NullProber:
+    """Default prober stub — never returns probe data."""
+
+    def list_subtitle_streams(self, video):  # pragma: no cover
+        return []
+
+    def probe(self, video):
+        return None
+
+
+_DEFAULT_PROBER = _NullProber()
+
+
+def resolve_season_destination(*args, prober=None, **kwargs):
+    return _resolve_season_destination(
+        *args, kb=_KB, prober=prober or _DEFAULT_PROBER, **kwargs
+    )
+
+
+def resolve_episode_destination(*args, prober=None, **kwargs):
+    return _resolve_episode_destination(
+        *args, kb=_KB, prober=prober or _DEFAULT_PROBER, **kwargs
+    )
+
+
+def resolve_movie_destination(*args, prober=None, **kwargs):
+    return _resolve_movie_destination(
+        *args, kb=_KB, prober=prober or _DEFAULT_PROBER, **kwargs
+    )
+
+
+def resolve_series_destination(*args, prober=None, **kwargs):
+    return _resolve_series_destination(
+        *args, kb=_KB, prober=prober or _DEFAULT_PROBER, **kwargs
+    )
+
 REL_EPISODE = "Oz.S01E01.1080p.WEBRip.x265-KONTRAST"
 REL_SEASON = "Oz.S03.1080p.WEBRip.x265-KONTRAST"
 REL_MOVIE = "Inception.2010.1080p.BluRay.x265-GROUP"
@@ -51,15 +89,6 @@ REL_SERIES = "Oz.Complete.Series.1080p.WEBRip.x265-KONTRAST"
 # --------------------------------------------------------------------------- #


-class TestSanitize:
-    def test_passthrough_safe_chars(self):
-        assert _sanitize("Oz.1997.1080p-GRP") == "Oz.1997.1080p-GRP"
-
-    def test_strips_windows_forbidden(self):
-        # ? : * " < > | \
-        assert _sanitize('a?b:c*d"e<f>g|h\\i') == "abcdefghi"
-
-
 # --------------------------------------------------------------------------- #
 # _find_existing_tvshow_folders                                                #
 # --------------------------------------------------------------------------- #
@@ -107,6 +136,7 @@ class TestResolveSeriesFolderInternal:
        out = _resolve_series_folder(
            tmp_path,
            "Oz",
+            "Oz",
            1997,
            "Oz.1997.WEBRip-KONTRAST",
            confirmed_folder="Oz.1997.X-GRP",
@@ -117,6 +147,7 @@ class TestResolveSeriesFolderInternal:
        out = _resolve_series_folder(
            tmp_path,
            "Oz",
+            "Oz",
            1997,
            "Oz.1997.WEBRip-KONTRAST",
            confirmed_folder="Oz.1997.New-X",
@@ -125,21 +156,21 @@ class TestResolveSeriesFolderInternal:

    def test_no_existing_returns_computed_as_new(self, tmp_path):
        out = _resolve_series_folder(
-            tmp_path, "Oz", 1997, "Oz.1997.WEBRip-KONTRAST", None
+            tmp_path, "Oz", "Oz", 1997, "Oz.1997.WEBRip-KONTRAST", None
        )
        assert out == ("Oz.1997.WEBRip-KONTRAST", True)

    def test_single_existing_matching_computed_returns_existing(self, tmp_path):
        (tmp_path / "Oz.1997.WEBRip-KONTRAST").mkdir()
        out = _resolve_series_folder(
-            tmp_path, "Oz", 1997, "Oz.1997.WEBRip-KONTRAST", None
+            tmp_path, "Oz", "Oz", 1997, "Oz.1997.WEBRip-KONTRAST", None
        )
        assert out == ("Oz.1997.WEBRip-KONTRAST", False)

    def test_single_existing_different_name_returns_clarification(self, tmp_path):
        (tmp_path / "Oz.1997.BluRay-OTHER").mkdir()
        out = _resolve_series_folder(
-            tmp_path, "Oz", 1997, "Oz.1997.WEBRip-KONTRAST", None
+            tmp_path, "Oz", "Oz", 1997, "Oz.1997.WEBRip-KONTRAST", None
        )
        assert isinstance(out, _Clarification)
        assert "Oz" in out.question
@@ -149,7 +180,7 @@ class TestResolveSeriesFolderInternal:
    def test_multiple_existing_returns_clarification(self, tmp_path):
        (tmp_path / "Oz.1997.A-GRP").mkdir()
        (tmp_path / "Oz.1997.B-GRP").mkdir()
-        out = _resolve_series_folder(tmp_path, "Oz", 1997, "Oz.1997.A-GRP", None)
+        out = _resolve_series_folder(tmp_path, "Oz", "Oz", 1997, "Oz.1997.A-GRP", None)
        assert isinstance(out, _Clarification)
        # Computed already in existing → not duplicated.
        assert out.options.count("Oz.1997.A-GRP") == 1
@@ -331,6 +362,102 @@ class TestSeries:
        assert out.status == "needs_clarification"


+# --------------------------------------------------------------------------- #
+# Probe enrichment wiring                                                      #
+# --------------------------------------------------------------------------- #
+
+
+class _StubProber:
+    """Minimal MediaProber stub used to drive enrich_from_probe."""
+
+    def __init__(self, info):
+        self._info = info
+
+    def list_subtitle_streams(self, video):  # pragma: no cover - unused here
+        return []
+
+    def probe(self, video):
+        return self._info
+
+
+def _stereo_movie_info():
+    """A MediaInfo that fills quality+codec when the release name omits them."""
+    from alfred.domain.shared.media import AudioTrack, MediaInfo, VideoTrack
+
+    return MediaInfo(
+        video_tracks=(VideoTrack(index=0, codec="hevc", width=1920, height=1080),),
+        audio_tracks=(
+            AudioTrack(
+                index=1,
+                codec="aac",
+                channels=2,
+                channel_layout="stereo",
+                language="eng",
+                is_default=True,
+            ),
+        ),
+        subtitle_tracks=(),
+    )
+
+
+class TestProbeEnrichmentWiring:
+    """When source_path/source_file points to a real file, the resolver
+    should pick up ffprobe data via inspect_release and let the enriched
+    tech_string land in the destination name."""
+
+    def test_movie_picks_up_probe_quality(self, cfg_memory, tmp_path):
+        # Release name parses to "movie" but is missing the quality token;
+        # probe must supply 1080p and refresh tech_string.
+        bare_name = "Inception.2010.BluRay.x264-GROUP"
+        video = tmp_path / "movie.mkv"
+        video.write_bytes(b"")
+
+        out = resolve_movie_destination(
+            bare_name,
+            str(video),
+            "Inception",
+            2010,
+            prober=_StubProber(_stereo_movie_info()),
+        )
+
+        assert out.status == "ok"
+        # tech_string -> "1080p.BluRay.x264" -> "1080p" shows up in names.
+        assert "1080p" in out.movie_folder_name
+        assert "1080p" in out.filename
+
+    def test_movie_skips_probe_when_path_missing(self, cfg_memory):
+        # If the file doesn't exist, no probe runs (the stub would have
+        # injected 1080p — its absence proves the skip).
+        out = resolve_movie_destination(
+            "Inception.2010.BluRay.x264-GROUP",
+            "/nowhere/m.mkv",
+            "Inception",
+            2010,
+            prober=_StubProber(_stereo_movie_info()),
+        )
+        assert out.status == "ok"
+        assert "1080p" not in out.movie_folder_name
+
+    def test_season_picks_up_probe_via_source_path(self, cfg_memory, tmp_path):
+        # Season pack name missing quality token; probe must add it.
+        bare_name = "Oz.S03.BluRay.x265-KONTRAST"
+        release_dir = tmp_path / bare_name
+        release_dir.mkdir()
+        (release_dir / "episode.mkv").write_bytes(b"")
+
+        out = resolve_season_destination(
+            bare_name,
+            "Oz",
+            1997,
+            source_path=str(release_dir),
+            prober=_StubProber(_stereo_movie_info()),
+        )
+
+        assert out.status == "ok"
+        # Series folder name embeds tech_string -> "1080p" surfaced by probe.
+        assert "1080p" in out.series_folder_name
+
+
 # --------------------------------------------------------------------------- #
 # DTO to_dict()                                                                #
 # --------------------------------------------------------------------------- #
@@ -21,7 +21,7 @@ from unittest.mock import patch

 import pytest

-from alfred.domain.subtitles.entities import SubtitleCandidate
+from alfred.domain.subtitles.entities import SubtitleScanResult
 from alfred.application.subtitles.placer import (
    PlacedTrack,
    PlaceResult,
@@ -46,8 +46,8 @@ def _track(
    fmt=SRT,
    stype=SubtitleType.STANDARD,
    is_embedded: bool = False,
-) -> SubtitleCandidate:
-    return SubtitleCandidate(
+) -> SubtitleScanResult:
+    return SubtitleScanResult(
        language=lang,
        format=fmt,
        subtitle_type=stype,
@@ -0,0 +1,130 @@
+"""Tests for the pre-pipeline exclusion helpers (Phase A bis)."""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import pytest
+
+from alfred.application.release.supported_media import (
+    find_main_video,
+    is_supported_video,
+)
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
+
+_KB = YamlReleaseKnowledge()
+
+
+# --------------------------------------------------------------------- #
+# is_supported_video                                                    #
+# --------------------------------------------------------------------- #
+
+
+class TestIsSupportedVideo:
+    def test_mkv_is_supported(self, tmp_path: Path) -> None:
+        f = tmp_path / "movie.mkv"
+        f.touch()
+        assert is_supported_video(f, _KB) is True
+
+    def test_mp4_is_supported(self, tmp_path: Path) -> None:
+        f = tmp_path / "movie.mp4"
+        f.touch()
+        assert is_supported_video(f, _KB) is True
+
+    def test_uppercase_extension_is_supported(self, tmp_path: Path) -> None:
+        # File systems can return mixed case; we lowercase the suffix.
+        f = tmp_path / "movie.MKV"
+        f.touch()
+        assert is_supported_video(f, _KB) is True
+
+    def test_srt_is_not_video(self, tmp_path: Path) -> None:
+        f = tmp_path / "movie.srt"
+        f.touch()
+        assert is_supported_video(f, _KB) is False
+
+    def test_nfo_is_not_video(self, tmp_path: Path) -> None:
+        f = tmp_path / "movie.nfo"
+        f.touch()
+        assert is_supported_video(f, _KB) is False
+
+    def test_no_extension_is_not_video(self, tmp_path: Path) -> None:
+        f = tmp_path / "README"
+        f.touch()
+        assert is_supported_video(f, _KB) is False
+
+    def test_directory_is_not_video(self, tmp_path: Path) -> None:
+        d = tmp_path / "subdir.mkv"  # even with a video extension
+        d.mkdir()
+        assert is_supported_video(d, _KB) is False
+
+    def test_nonexistent_path_is_not_video(self, tmp_path: Path) -> None:
+        assert is_supported_video(tmp_path / "ghost.mkv", _KB) is False
+
+
+# --------------------------------------------------------------------- #
+# find_main_video                                                       #
+# --------------------------------------------------------------------- #
+
+
+class TestFindMainVideo:
+    def test_single_video_file_in_folder(self, tmp_path: Path) -> None:
+        main = tmp_path / "Movie.2020.mkv"
+        main.touch()
+        assert find_main_video(tmp_path, _KB) == main
+
+    def test_returns_lexicographically_first_among_multiple(
+        self, tmp_path: Path
+    ) -> None:
+        # Legitimate for season packs: pick the first episode by name.
+        ep2 = tmp_path / "Show.S01E02.mkv"
+        ep1 = tmp_path / "Show.S01E01.mkv"
+        ep2.touch()
+        ep1.touch()
+        assert find_main_video(tmp_path, _KB) == ep1
+
+    def test_skips_non_video_files(self, tmp_path: Path) -> None:
+        # nfo and srt come alphabetically before .mkv, must not win.
+        (tmp_path / "Movie.nfo").touch()
+        (tmp_path / "Movie.srt").touch()
+        vid = tmp_path / "Movie.mkv"
+        vid.touch()
+        assert find_main_video(tmp_path, _KB) == vid
+
+    def test_ignores_subdirectories(self, tmp_path: Path) -> None:
+        # A Sample/ subdir must NOT be descended into.
+        sample_dir = tmp_path / "Sample"
+        sample_dir.mkdir()
+        (sample_dir / "sample.mkv").touch()
+        main = tmp_path / "Movie.mkv"
+        main.touch()
+        assert find_main_video(tmp_path, _KB) == main
+
+    def test_only_subdirectory_with_video_returns_none(
+        self, tmp_path: Path
+    ) -> None:
+        # No top-level video, only one inside a subdir → None.
+        sub = tmp_path / "Sample"
+        sub.mkdir()
+        (sub / "video.mkv").touch()
+        assert find_main_video(tmp_path, _KB) is None
+
+    def test_empty_folder_returns_none(self, tmp_path: Path) -> None:
+        assert find_main_video(tmp_path, _KB) is None
+
+    def test_nonexistent_folder_returns_none(self, tmp_path: Path) -> None:
+        assert find_main_video(tmp_path / "ghost", _KB) is None
+
+    def test_single_file_release_passed_as_folder_arg(
+        self, tmp_path: Path
+    ) -> None:
+        # Some releases are a bare .mkv with no enclosing folder.
+        f = tmp_path / "Movie.2020.1080p.mkv"
+        f.touch()
+        assert find_main_video(f, _KB) == f
+
+    def test_single_file_non_video_passed_as_folder_arg(
+        self, tmp_path: Path
+    ) -> None:
+        f = tmp_path / "README.nfo"
+        f.touch()
+        assert find_main_video(f, _KB) is None
@@ -0,0 +1,216 @@
+"""EASY-path tests for the v2 annotate-based pipeline.
+
+These tests assert that the **v2 pipeline itself** produces the correct
+annotated stream and assembled fields for releases from known groups
+(KONTRAST, ELiTE, …) — without going through ``parse_release``. The
+fixtures suite (``tests/domain/test_release_fixtures.py``) already
+locks the user-visible ``ParsedRelease`` contract; here we cover the
+internal pipeline behavior so a future refactor of ``parse_release``
+can't quietly drop EASY without us noticing.
+"""
+
+from __future__ import annotations
+
+from alfred.domain.release.parser import TokenRole
+from alfred.domain.release.parser.pipeline import (
+    _detect_group,
+    annotate,
+    assemble,
+    tokenize,
+)
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
+
+_KB = YamlReleaseKnowledge()
+
+
+class TestDetectGroup:
+    def test_codec_group(self) -> None:
+        tokens, _ = tokenize(
+            "Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST", _KB
+        )
+        name, idx = _detect_group(tokens, _KB)
+        assert name == "KONTRAST"
+        assert idx == 6  # x265-KONTRAST is the 7th token
+
+    def test_unknown_when_no_dash(self) -> None:
+        tokens, _ = tokenize("Some.Movie.2020.1080p.WEBRip.x265.KONTRAST", _KB)
+        # No dash anywhere → no group detected.
+        name, idx = _detect_group(tokens, _KB)
+        assert idx is None
+        assert name == "UNKNOWN"
+
+    def test_skips_dashed_source(self) -> None:
+        # "Web-DL" must not be mistaken for a group token.
+        tokens, _ = tokenize("Movie.2020.1080p.Web-DL.x265-GRP", _KB)
+        name, idx = _detect_group(tokens, _KB)
+        assert name == "GRP"
+
+
+class TestAnnotateEasy:
+    def test_kontrast_movie(self) -> None:
+        tokens, tag = tokenize(
+            "Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST", _KB
+        )
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None, "KONTRAST should hit the EASY path"
+
+        roles = [t.role for t in annotated]
+        assert roles == [
+            TokenRole.TITLE,  # Back
+            TokenRole.TITLE,  # in
+            TokenRole.TITLE,  # Action
+            TokenRole.YEAR,
+            TokenRole.RESOLUTION,
+            TokenRole.SOURCE,
+            TokenRole.CODEC,  # x265-KONTRAST → CODEC with extra.group=KONTRAST
+        ]
+        assert annotated[-1].extra["group"] == "KONTRAST"
+        assert annotated[-1].extra["codec"] == "x265"
+
+    def test_kontrast_tv_episode(self) -> None:
+        tokens, _ = tokenize(
+            "Slow.Horses.S05E01.1080p.WEBRip.x265-KONTRAST", _KB
+        )
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None
+
+        # Year is optional and absent → skipped. Season_episode present.
+        roles = [t.role for t in annotated]
+        assert TokenRole.SEASON_EPISODE in roles
+        assert TokenRole.YEAR not in roles
+
+    def test_elite_no_source(self) -> None:
+        # ELiTE schema marks source as optional — Foundation.S02 omits it.
+        tokens, _ = tokenize("Foundation.S02.1080p.x265-ELiTE", _KB)
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None, "ELiTE optional source must be tolerated"
+
+        roles = [t.role for t in annotated]
+        assert TokenRole.SOURCE not in roles
+        assert TokenRole.RESOLUTION in roles
+        assert TokenRole.CODEC in roles
+
+    def test_unknown_group_falls_to_shitty(self) -> None:
+        tokens, _ = tokenize("Some.Movie.2020.1080p.WEBRip.x264-RANDOM", _KB)
+        # RANDOM is not in our release_groups/ — annotate() now falls
+        # through to the in-pipeline SHITTY pass and returns a populated
+        # token list (no None sentinel anymore).
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None
+        roles = [t.role for t in annotated]
+        # Title is "Some.Movie", then YEAR, RESOLUTION, SOURCE, CODEC
+        # carrying the group in extra.
+        assert TokenRole.TITLE in roles
+        assert TokenRole.YEAR in roles
+        assert TokenRole.RESOLUTION in roles
+        assert TokenRole.SOURCE in roles
+        assert TokenRole.CODEC in roles
+        codec_tok = next(t for t in annotated if t.role is TokenRole.CODEC)
+        assert codec_tok.extra.get("group") == "RANDOM"
+
+
+class TestAssemble:
+    def test_kontrast_movie_fields(self) -> None:
+        name = "Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST"
+        tokens, tag = tokenize(name, _KB)
+        annotated = annotate(tokens, _KB)
+        fields = assemble(annotated, tag, name, _KB)
+
+        assert fields["title"] == "Back.in.Action"
+        assert fields["year"] == 2025
+        assert fields["season"] is None
+        assert fields["quality"] == "1080p"
+        assert fields["source"] == "WEBRip"
+        assert fields["codec"] == "x265"
+        assert fields["group"] == "KONTRAST"
+        assert fields["media_type"] == "movie"
+        assert fields["site_tag"] is None
+
+    def test_kontrast_tv_fields(self) -> None:
+        name = "Slow.Horses.S05E01.1080p.WEBRip.x265-KONTRAST"
+        tokens, tag = tokenize(name, _KB)
+        annotated = annotate(tokens, _KB)
+        fields = assemble(annotated, tag, name, _KB)
+
+        assert fields["title"] == "Slow.Horses"
+        assert fields["year"] is None
+        assert fields["season"] == 5
+        assert fields["episode"] == 1
+        assert fields["media_type"] == "tv_show"
+        assert fields["group"] == "KONTRAST"
+
+    def test_elite_season_pack(self) -> None:
+        name = "Foundation.S02.1080p.x265-ELiTE"
+        tokens, tag = tokenize(name, _KB)
+        annotated = annotate(tokens, _KB)
+        fields = assemble(annotated, tag, name, _KB)
+
+        assert fields["title"] == "Foundation"
+        assert fields["season"] == 2
+        assert fields["episode"] is None  # season pack
+        assert fields["source"] is None  # ELiTE omits it
+        assert fields["quality"] == "1080p"
+        assert fields["codec"] == "x265"
+        assert fields["group"] == "ELiTE"
+
+
+class TestEnrichers:
+    """Non-positional roles populated alongside the structural walk.
+
+    These releases would have failed the v2 EASY path before the enricher
+    pass landed (leftover unknown tokens would force a fallback). They
+    now succeed in v2 with rich metadata.
+    """
+
+    def test_bit_depth_and_audio(self) -> None:
+        name = "Back.in.Action.2025.1080p.WEBRip.10bit.DDP.5.1.x265-KONTRAST"
+        tokens, tag = tokenize(name, _KB)
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None
+        fields = assemble(annotated, tag, name, _KB)
+
+        assert fields["title"] == "Back.in.Action"
+        assert fields["bit_depth"] == "10bit"
+        assert fields["audio_codec"] == "DDP"
+        assert fields["audio_channels"] == "5.1"
+
+    def test_hdr_sequence(self) -> None:
+        # DV.HDR10 sequence + TrueHD.Atmos sequence + 7.1 channels +
+        # DIRECTORS.CUT edition all in one release.
+        name = (
+            "Some.Movie.2024.DIRECTORS.CUT.2160p.BluRay.DV.HDR10."
+            "TrueHD.Atmos.7.1.x265-KONTRAST"
+        )
+        tokens, tag = tokenize(name, _KB)
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None
+        fields = assemble(annotated, tag, name, _KB)
+
+        assert fields["edition"] == "DIRECTORS.CUT"
+        assert fields["hdr_format"] == "DV.HDR10"
+        assert fields["audio_codec"] == "TrueHD.Atmos"
+        assert fields["audio_channels"] == "7.1"
+
+    def test_multiple_languages(self) -> None:
+        name = "Movie.2020.FRENCH.MULTI.1080p.WEBRip.DTS.HD.MA.5.1.x265-KONTRAST"
+        tokens, tag = tokenize(name, _KB)
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None
+        fields = assemble(annotated, tag, name, _KB)
+
+        assert fields["languages"] == ("FRENCH", "MULTI")
+        assert fields["audio_codec"] == "DTS-HD.MA"
+        assert fields["audio_channels"] == "5.1"
+
+    def test_tv_with_language(self) -> None:
+        name = "Show.S01E05.FRENCH.1080p.WEBRip.x265-KONTRAST"
+        tokens, tag = tokenize(name, _KB)
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None
+        fields = assemble(annotated, tag, name, _KB)
+
+        assert fields["title"] == "Show"
+        assert fields["season"] == 1
+        assert fields["episode"] == 5
+        assert fields["languages"] == ("FRENCH",)
+        assert fields["media_type"] == "tv_show"
@@ -0,0 +1,79 @@
+"""Scaffolding tests for the v2 parser package.
+
+These tests lock the **shape** of the new pipeline (token VOs, tokenize
+output, site-tag stripping) before the annotate step is wired in. They
+do not check parsed-release output yet — that comes once :func:`annotate`
+is implemented and the fixtures-based suite switches over.
+"""
+
+from __future__ import annotations
+
+from alfred.domain.release.parser import Token, TokenRole
+from alfred.domain.release.parser.pipeline import strip_site_tag, tokenize
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
+
+_KB = YamlReleaseKnowledge()
+
+
+class TestToken:
+    def test_default_role_is_unknown(self) -> None:
+        t = Token(text="1080p", index=3)
+        assert t.role is TokenRole.UNKNOWN
+        assert not t.is_annotated
+
+    def test_with_role_returns_new_instance(self) -> None:
+        t = Token(text="1080p", index=3)
+        promoted = t.with_role(TokenRole.RESOLUTION)
+        assert promoted is not t
+        assert promoted.role is TokenRole.RESOLUTION
+        assert t.role is TokenRole.UNKNOWN  # original unchanged (frozen)
+
+    def test_with_role_merges_extra(self) -> None:
+        t = Token(text="x265-KONTRAST", index=5)
+        promoted = t.with_role(TokenRole.CODEC, group="KONTRAST")
+        assert promoted.role is TokenRole.CODEC
+        assert promoted.extra == {"group": "KONTRAST"}
+
+
+class TestStripSiteTag:
+    def test_no_tag(self) -> None:
+        clean, tag = strip_site_tag("The.Movie.2020.1080p-GRP")
+        assert tag is None
+        assert clean == "The.Movie.2020.1080p-GRP"
+
+    def test_suffix_tag(self) -> None:
+        clean, tag = strip_site_tag("Sinners.2025.1080p-[YTS.MX]")
+        assert tag == "YTS.MX"
+        assert clean == "Sinners.2025.1080p-"
+
+    def test_prefix_tag(self) -> None:
+        clean, tag = strip_site_tag("[ OxTorrent.vc ] The.Title.S01E01")
+        assert tag == "OxTorrent.vc"
+        assert clean == "The.Title.S01E01"
+
+
+class TestTokenize:
+    def test_simple_release(self) -> None:
+        tokens, tag = tokenize("Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST", _KB)
+        assert tag is None
+        texts = [t.text for t in tokens]
+        # Dash is not a separator, so x265-KONTRAST stays glued.
+        assert texts == [
+            "Back", "in", "Action", "2025", "1080p", "WEBRip", "x265-KONTRAST",
+        ]
+
+    def test_all_tokens_start_unknown(self) -> None:
+        tokens, _ = tokenize("Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST", _KB)
+        assert all(t.role is TokenRole.UNKNOWN for t in tokens)
+
+    def test_indexes_are_contiguous(self) -> None:
+        tokens, _ = tokenize("A.B.C.D", _KB)
+        assert [t.index for t in tokens] == [0, 1, 2, 3]
+
+    def test_strips_site_tag_before_tokenize(self) -> None:
+        tokens, tag = tokenize(
+            "Sinners.2025.1080p.WEBRip.x265.10bit.AAC5.1-[YTS.MX]", _KB
+        )
+        assert tag == "YTS.MX"
+        # Site tag substring must not appear among tokens.
+        assert not any("YTS" in t.text for t in tokens)
@@ -0,0 +1,279 @@
+"""Phase A — parse-confidence scoring.
+
+These tests pin the score / road semantics without going through
+fixtures. They exercise the small pure functions in
+``alfred.domain.release.parser.scoring`` and the end-to-end contract
+that ``parse_release`` returns a ``(ParsedRelease, ParseReport)`` tuple.
+"""
+
+from __future__ import annotations
+
+import pytest
+
+from alfred.domain.release.parser.scoring import (
+    Road,
+    collect_missing_critical,
+    collect_unknown_tokens,
+    compute_score,
+    decide_road,
+)
+from alfred.domain.release.parser.tokens import Token, TokenRole
+from alfred.domain.release.services import parse_release
+from alfred.domain.release.value_objects import (
+    MediaTypeToken,
+    ParsedRelease,
+    ParseReport,
+    TokenizationRoute,
+)
+from alfred.domain.shared.exceptions import ValidationError
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
+
+_KB = YamlReleaseKnowledge()
+
+
+# --------------------------------------------------------------------- #
+# ParseReport VO                                                        #
+# --------------------------------------------------------------------- #
+
+
+class TestParseReport:
+    def test_construct_with_defaults(self) -> None:
+        report = ParseReport(confidence=80, road="easy")
+        assert report.confidence == 80
+        assert report.road == "easy"
+        assert report.unknown_tokens == ()
+        assert report.missing_critical == ()
+
+    def test_is_frozen(self) -> None:
+        report = ParseReport(confidence=50, road="shitty")
+        with pytest.raises(Exception):  # FrozenInstanceError
+            report.confidence = 99  # type: ignore[misc]
+
+    def test_confidence_lower_bound(self) -> None:
+        with pytest.raises(ValidationError):
+            ParseReport(confidence=-1, road="easy")
+
+    def test_confidence_upper_bound(self) -> None:
+        with pytest.raises(ValidationError):
+            ParseReport(confidence=101, road="easy")
+
+
+# --------------------------------------------------------------------- #
+# compute_score                                                         #
+# --------------------------------------------------------------------- #
+
+
+def _movie(year: int = 2020, **overrides) -> ParsedRelease:
+    """Build a populated movie ParsedRelease for scoring tests."""
+    base = dict(
+        raw="Inception.2010.1080p.BluRay.x264-GROUP",
+        clean="Inception.2010.1080p.BluRay.x264-GROUP",
+        title="Inception",
+        title_sanitized="Inception",
+        year=year,
+        season=None,
+        episode=None,
+        episode_end=None,
+        quality="1080p",
+        source="BluRay",
+        codec="x264",
+        group="GROUP",
+        media_type=MediaTypeToken.MOVIE,
+        parse_path=TokenizationRoute.DIRECT,
+    )
+    base.update(overrides)
+    return ParsedRelease(**base)
+
+
+def _all_annotated() -> list[Token]:
+    """Token stream where everything is annotated — zero penalty."""
+    return [
+        Token("Inception", 0, TokenRole.TITLE),
+        Token("2010", 1, TokenRole.YEAR),
+        Token("1080p", 2, TokenRole.RESOLUTION),
+        Token("BluRay", 3, TokenRole.SOURCE),
+        Token("x264", 4, TokenRole.CODEC),
+        Token("GROUP", 5, TokenRole.GROUP),
+    ]
+
+
+class TestComputeScore:
+    def test_fully_populated_movie_scores_high(self) -> None:
+        parsed = _movie()
+        score = compute_score(parsed, _all_annotated(), _KB)
+        # title 30 + media_type 20 + year 15 + resolution 5 + source 5
+        # + codec 5 + group 5 = 85
+        assert score == 85
+
+    def test_tv_show_gets_season_and_episode_weight(self) -> None:
+        parsed = ParsedRelease(
+            raw="Oz.S01E01.1080p.WEBRip.x265-KONTRAST",
+            clean="Oz.S01E01.1080p.WEBRip.x265-KONTRAST",
+            title="Oz",
+            title_sanitized="Oz",
+            year=None,
+            season=1,
+            episode=1,
+            episode_end=None,
+            quality="1080p",
+            source="WEBRip",
+            codec="x265",
+            group="KONTRAST",
+            media_type=MediaTypeToken.TV_SHOW,
+            parse_path=TokenizationRoute.DIRECT,
+        )
+        tokens = [
+            Token("Oz", 0, TokenRole.TITLE),
+            Token("S01E01", 1, TokenRole.SEASON_EPISODE),
+            Token("1080p", 2, TokenRole.RESOLUTION),
+            Token("WEBRip", 3, TokenRole.SOURCE),
+            Token("x265", 4, TokenRole.CODEC),
+            Token("KONTRAST", 5, TokenRole.GROUP),
+        ]
+        score = compute_score(parsed, tokens, _KB)
+        # title 30 + media_type 20 + season 10 + episode 5 + resolution 5
+        # + source 5 + codec 5 + group 5 = 85 (no year)
+        assert score == 85
+
+    def test_unknown_tokens_subtract_penalty(self) -> None:
+        parsed = _movie()
+        tokens = _all_annotated() + [
+            Token("noise", 6, TokenRole.UNKNOWN),
+            Token("more", 7, TokenRole.UNKNOWN),
+        ]
+        score = compute_score(parsed, tokens, _KB)
+        # 85 baseline - 2*5 unknown tokens = 75
+        assert score == 75
+
+    def test_unknown_penalty_capped(self) -> None:
+        parsed = _movie()
+        # 20 unknown tokens × 5 = 100 raw, capped at 30
+        tokens = _all_annotated() + [
+            Token(f"t{i}", 6 + i, TokenRole.UNKNOWN) for i in range(20)
+        ]
+        score = compute_score(parsed, tokens, _KB)
+        assert score == 85 - 30
+
+    def test_score_clamped_to_zero(self) -> None:
+        # Empty-ish parse with lots of unknown tokens
+        parsed = _movie(year=None, quality=None, source=None, codec=None)
+        tokens = [Token(f"t{i}", i, TokenRole.UNKNOWN) for i in range(10)]
+        score = compute_score(parsed, tokens, _KB)
+        # title 30 + media_type 20 + group 5 = 55, -30 cap = 25
+        # Sanity: still clamped at 0 minimum even if math goes weird
+        assert 0 <= score <= 100
+
+    def test_unknown_media_type_does_not_count(self) -> None:
+        parsed = _movie(media_type=MediaTypeToken.UNKNOWN)
+        score = compute_score(parsed, _all_annotated(), _KB)
+        # Loses the 20 of media_type vs baseline
+        assert score == 85 - 20
+
+    def test_unknown_group_does_not_count(self) -> None:
+        parsed = _movie(group="UNKNOWN")
+        score = compute_score(parsed, _all_annotated(), _KB)
+        assert score == 85 - 5
+
+
+# --------------------------------------------------------------------- #
+# decide_road                                                           #
+# --------------------------------------------------------------------- #
+
+
+class TestDecideRoad:
+    def test_known_schema_is_easy_regardless_of_score(self) -> None:
+        # Even a terrible score returns EASY when a schema matched.
+        assert decide_road(score=0, has_schema=True, kb=_KB) is Road.EASY
+
+    def test_no_schema_high_score_is_shitty(self) -> None:
+        assert decide_road(score=80, has_schema=False, kb=_KB) is Road.SHITTY
+
+    def test_no_schema_low_score_is_pop(self) -> None:
+        assert decide_road(score=10, has_schema=False, kb=_KB) is Road.PATH_OF_PAIN
+
+    def test_threshold_boundary_is_inclusive(self) -> None:
+        threshold = _KB.scoring["thresholds"]["shitty_min"]
+        assert decide_road(threshold, has_schema=False, kb=_KB) is Road.SHITTY
+        assert (
+            decide_road(threshold - 1, has_schema=False, kb=_KB)
+            is Road.PATH_OF_PAIN
+        )
+
+
+# --------------------------------------------------------------------- #
+# Collectors                                                            #
+# --------------------------------------------------------------------- #
+
+
+class TestCollectors:
+    def test_collect_unknown_tokens_preserves_order(self) -> None:
+        tokens = [
+            Token("A", 0, TokenRole.TITLE),
+            Token("X", 1, TokenRole.UNKNOWN),
+            Token("B", 2, TokenRole.RESOLUTION),
+            Token("Y", 3, TokenRole.UNKNOWN),
+        ]
+        assert collect_unknown_tokens(tokens) == ("X", "Y")
+
+    def test_collect_missing_critical_full(self) -> None:
+        empty = ParsedRelease(
+            raw="x",
+            clean="x",
+            title="",
+            title_sanitized="",
+            year=None,
+            season=None,
+            episode=None,
+            episode_end=None,
+            quality=None,
+            source=None,
+            codec=None,
+            group="UNKNOWN",
+            media_type=MediaTypeToken.UNKNOWN,
+            parse_path=TokenizationRoute.DIRECT,
+        )
+        assert set(collect_missing_critical(empty)) == {
+            "title",
+            "media_type",
+            "year",
+        }
+
+    def test_collect_missing_critical_none(self) -> None:
+        parsed = _movie()
+        assert collect_missing_critical(parsed) == ()
+
+
+# --------------------------------------------------------------------- #
+# End-to-end contract                                                   #
+# --------------------------------------------------------------------- #
+
+
+class TestParseReleaseReturnsReport:
+    def test_returns_tuple(self) -> None:
+        result = parse_release("Inception.2010.1080p.BluRay.x264-GROUP", _KB)
+        assert isinstance(result, tuple)
+        assert len(result) == 2
+        parsed, report = result
+        assert isinstance(parsed, ParsedRelease)
+        assert isinstance(report, ParseReport)
+
+    def test_known_group_is_easy_road(self) -> None:
+        # KONTRAST has a schema in release_groups/
+        _, report = parse_release(
+            "Oz.S03E01.1080p.WEBRip.x265-KONTRAST", _KB
+        )
+        assert report.road == Road.EASY.value
+        assert report.confidence > 0
+
+    def test_unknown_group_well_formed_is_shitty(self) -> None:
+        # No registered schema but well-formed scene name → SHITTY
+        _, report = parse_release(
+            "Inception.2010.1080p.BluRay.x264-NOSCHEMA", _KB
+        )
+        assert report.road == Road.SHITTY.value
+
+    def test_malformed_name_is_pop(self) -> None:
+        # Forbidden chars (@) — short-circuits to AI / PoP.
+        _, report = parse_release("garbage@#%name", _KB)
+        assert report.road == Road.PATH_OF_PAIN.value
+        assert report.confidence == 0
@@ -20,13 +20,21 @@ import pytest

 from alfred.domain.release.services import parse_release
 from alfred.domain.release.value_objects import ParsedRelease
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
+
+_KB = YamlReleaseKnowledge()
+
+
+def _parse(name: str) -> ParsedRelease:
+    parsed, _report = parse_release(name, _KB)
+    return parsed


 class TestParseTVEpisode:
    """Single-episode TV releases."""

    def test_basic_tv_episode(self):
-        r = parse_release("Oz.S03E01.1080p.WEBRip.x265-KONTRAST")
+        r = _parse("Oz.S03E01.1080p.WEBRip.x265-KONTRAST")
        assert r.title == "Oz"
        assert r.season == 3
        assert r.episode == 1
@@ -40,27 +48,27 @@ class TestParseTVEpisode:
        assert r.is_season_pack is False

    def test_multi_episode(self):
-        r = parse_release("Archer.S14E09E10.1080p.WEB.x265-GRP")
+        r = _parse("Archer.S14E09E10.1080p.WEB.x265-GRP")
        assert r.season == 14
        assert r.episode == 9
        assert r.episode_end == 10

    def test_nxnn_alt_form(self):
        # Alt season/episode form: 1x05 instead of S01E05.
-        r = parse_release("Some.Show.1x05.720p.HDTV.x264-GRP")
+        r = _parse("Some.Show.1x05.720p.HDTV.x264-GRP")
        assert r.season == 1
        assert r.episode == 5
        assert r.episode_end is None
        assert r.media_type == "tv_show"

    def test_nxnnxnn_multi_episode_alt_form(self):
-        r = parse_release("Some.Show.2x07x08.1080p.WEB.x265-GRP")
+        r = _parse("Some.Show.2x07x08.1080p.WEB.x265-GRP")
        assert r.season == 2
        assert r.episode == 7
        assert r.episode_end == 8

    def test_season_pack(self):
-        r = parse_release("Oz.S03.1080p.WEBRip.x265-KONTRAST")
+        r = _parse("Oz.S03.1080p.WEBRip.x265-KONTRAST")
        assert r.season == 3
        assert r.episode is None
        assert r.is_season_pack is True
@@ -71,7 +79,7 @@ class TestParseMovie:
    """Movie releases."""

    def test_basic_movie(self):
-        r = parse_release("Inception.2010.1080p.BluRay.x264-GROUP")
+        r = _parse("Inception.2010.1080p.BluRay.x264-GROUP")
        assert r.title == "Inception"
        assert r.year == 2010
        assert r.season is None
@@ -83,13 +91,13 @@ class TestParseMovie:
        assert r.media_type == "movie"

    def test_movie_multi_word_title(self):
-        r = parse_release("The.Dark.Knight.2008.2160p.UHD.BluRay.x265-TERMINAL")
+        r = _parse("The.Dark.Knight.2008.2160p.UHD.BluRay.x265-TERMINAL")
        assert r.title == "The.Dark.Knight"
        assert r.year == 2008
        assert r.quality == "2160p"

    def test_movie_without_year_still_movie_if_tech_present(self):
-        r = parse_release("UntitledFilm.1080p.WEBRip.x264-GRP")
+        r = _parse("UntitledFilm.1080p.WEBRip.x264-GRP")
        # No season, no year, but tech markers → still movie
        assert r.media_type == "movie"
        assert r.year is None
@@ -99,39 +107,39 @@ class TestParseEdgeCases:
    """Site tags, malformed names, and unknown media types."""

    def test_site_tag_prefix_stripped(self):
-        r = parse_release("[ OxTorrent.vc ] The.Title.S01E01.1080p.WEB.x265-GRP")
+        r = _parse("[ OxTorrent.vc ] The.Title.S01E01.1080p.WEB.x265-GRP")
        assert r.site_tag == "OxTorrent.vc"
        assert r.parse_path == "sanitized"
        assert r.season == 1
        assert r.episode == 1

    def test_site_tag_suffix_stripped(self):
-        r = parse_release("The.Title.S01E01.1080p.WEB.x265-NTb[TGx]")
+        r = _parse("The.Title.S01E01.1080p.WEB.x265-NTb[TGx]")
        assert r.site_tag == "TGx"
        # Suffix-tagged names are well-formed (only [] in tag → after strip clean)
        assert r.season == 1

    def test_irrecoverably_malformed(self):
        # @ is a forbidden char and not stripped by _sanitize → stays malformed
-        r = parse_release("foo@bar@baz")
+        r = _parse("foo@bar@baz")
        assert r.media_type == "unknown"
        assert r.parse_path == "ai"
        assert r.group == "UNKNOWN"

    def test_empty_unknown_when_no_evidence(self):
-        r = parse_release("Some.Random.Title")
+        r = _parse("Some.Random.Title")
        # No season, no year, no tech markers → unknown
        assert r.media_type == "unknown"

    def test_missing_group_defaults_to_unknown(self):
-        r = parse_release("Movie.2020.1080p.WEBRip.x265")
+        r = _parse("Movie.2020.1080p.WEBRip.x265")
        # No "-GROUP" suffix → group = "UNKNOWN"
        assert r.group == "UNKNOWN"

    def test_yts_bracket_release(self):
        # YTS-style: spaces, parens for year, multiple bracketed tech tokens.
        # The tokenizer must handle ' ', '(', ')', '[', ']' transparently.
-        r = parse_release("The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]")
+        r = _parse("The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]")
        assert r.title == "The.Father"
        assert r.year == 2020
        assert r.quality == "1080p"
@@ -141,7 +149,7 @@ class TestParseEdgeCases:

    def test_human_friendly_spaces(self):
        # Spaces as separators (no brackets).
-        r = parse_release("Inception 2010 1080p BluRay x264-GROUP")
+        r = _parse("Inception 2010 1080p BluRay x264-GROUP")
        assert r.title == "Inception"
        assert r.year == 2010
        assert r.quality == "1080p"
@@ -151,7 +159,7 @@ class TestParseEdgeCases:

    def test_underscore_separators(self):
        # Old usenet style: underscores between tokens.
-        r = parse_release("Some_Show_S01E01_1080p_WEB_x265-GRP")
+        r = _parse("Some_Show_S01E01_1080p_WEB_x265-GRP")
        assert r.season == 1
        assert r.episode == 1
        assert r.quality == "1080p"
@@ -162,15 +170,15 @@ class TestParseAudioVideoEdition:
    """Audio, video metadata, edition extraction."""

    def test_audio_codec_and_channels(self):
-        r = parse_release("Movie.2020.1080p.BluRay.DTS.5.1.x264-GRP")
+        r = _parse("Movie.2020.1080p.BluRay.DTS.5.1.x264-GRP")
        assert r.audio_channels == "5.1"

    def test_language_token(self):
-        r = parse_release("Movie.2020.MULTI.1080p.WEBRip.x265-GRP")
+        r = _parse("Movie.2020.MULTI.1080p.WEBRip.x265-GRP")
        assert "MULTI" in r.languages

    def test_edition_token(self):
-        r = parse_release("Movie.2020.UNRATED.1080p.BluRay.x264-GRP")
+        r = _parse("Movie.2020.UNRATED.1080p.BluRay.x264-GRP")
        assert r.edition == "UNRATED"


@@ -178,19 +186,21 @@ class TestParsedReleaseFolderNames:
    """Helpers that build filesystem-safe folder/filenames."""

    def _parsed_tv(self) -> ParsedRelease:
-        return parse_release("Oz.S03E01.1080p.WEBRip.x265-KONTRAST")
+        return _parse("Oz.S03E01.1080p.WEBRip.x265-KONTRAST")

    def _parsed_movie(self) -> ParsedRelease:
-        return parse_release("Inception.2010.1080p.BluRay.x264-GROUP")
+        return _parse("Inception.2010.1080p.BluRay.x264-GROUP")

    def test_show_folder_name(self):
        r = self._parsed_tv()
        assert r.show_folder_name("Oz", 1997) == "Oz.1997.1080p.WEBRip.x265-KONTRAST"

-    def test_show_folder_name_strips_windows_chars(self):
+    def test_show_folder_name_uses_already_safe_title(self):
+        # Option B: callers sanitize at the use-case boundary via
+        # kb.sanitize_for_fs(...) before passing the title in.
        r = self._parsed_tv()
-        # Colons and question marks are Windows-forbidden — must be stripped.
-        result = r.show_folder_name("Oz: The Series?", 1997)
+        safe = _KB.sanitize_for_fs("Oz: The Series?")
+        result = r.show_folder_name(safe, 1997)
        assert ":" not in result
        assert "?" not in result

@@ -202,7 +212,7 @@ class TestParsedReleaseFolderNames:
        assert "E01" not in result

    def test_season_folder_name_multi_episode(self):
-        r = parse_release("Archer.S14E09E10E11.1080p.WEB.x265-GRP")
+        r = _parse("Archer.S14E09E10E11.1080p.WEB.x265-GRP")
        result = r.season_folder_name()
        assert "S14" in result
        assert "E09" not in result
@@ -251,21 +261,21 @@ class TestParsedReleaseInvariants:

    def test_raw_is_preserved(self):
        raw = "Oz.S03E01.1080p.WEBRip.x265-KONTRAST"
-        r = parse_release(raw)
+        r = _parse(raw)
        assert r.raw == raw

-    def test_languages_defaults_to_empty_list_not_none(self):
-        r = parse_release("Movie.2020.1080p.BluRay.x264-GRP")
-        # __post_init__ ensures languages is a list, never None
-        assert r.languages == []
+    def test_languages_defaults_to_empty_tuple_not_none(self):
+        r = _parse("Movie.2020.1080p.BluRay.x264-GRP")
+        # ``languages`` defaults to an empty tuple (frozen VO).
+        assert r.languages == ()

    def test_tech_string_joined(self):
-        r = parse_release("Movie.2020.1080p.BluRay.x264-GRP")
+        r = _parse("Movie.2020.1080p.BluRay.x264-GRP")
        assert r.tech_string == "1080p.BluRay.x264"

    def test_tech_string_partial(self):
        # Codec-only release (no quality/source): tech_string == codec
-        r = parse_release("Show.S01E01.x265-GRP")
+        r = _parse("Show.S01E01.x265-GRP")
        assert r.tech_string == "x265"
        assert r.codec == "x265"
        assert r.quality is None
@@ -280,4 +290,4 @@ class TestParsedReleaseInvariants:
        ],
    )
    def test_media_type_inference(self, name, expected_type):
-        assert parse_release(name).media_type == expected_type
+        assert _parse(name).media_type == expected_type
@@ -19,24 +19,38 @@ from dataclasses import asdict
 import pytest

 from alfred.domain.release.services import parse_release
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
 from tests.fixtures.releases.conftest import ReleaseFixture, discover_fixtures

+_KB = YamlReleaseKnowledge()
 FIXTURES = discover_fixtures()


+def _fixture_param(f: ReleaseFixture) -> pytest.param:
+    marks = []
+    if f.xfail_reason:
+        marks.append(pytest.mark.xfail(reason=f.xfail_reason, strict=False))
+    return pytest.param(f, id=f.name, marks=marks)
+
+
@pytest.mark.parametrize(
    "fixture",
-    FIXTURES,
-    ids=[f.name for f in FIXTURES],
+    [_fixture_param(f) for f in FIXTURES],
 )
 def test_parse_matches_fixture(fixture: ReleaseFixture, tmp_path) -> None:
    # Materialize the tree to assert it is at least well-formed YAML +
    # plausible filesystem paths. Catches typos / missing leading dirs early.
    fixture.materialize(tmp_path)

-    result = asdict(parse_release(fixture.release_name))
-    # ``is_season_pack`` is a @property — asdict() does not include it.
-    result["is_season_pack"] = parse_release(fixture.release_name).is_season_pack
+    parsed, _report = parse_release(fixture.release_name, _KB)
+    result = asdict(parsed)
+    # ``is_season_pack`` and ``tech_string`` are @property values —
+    # ``asdict()`` does not include them.
+    result["is_season_pack"] = parsed.is_season_pack
+    result["tech_string"] = parsed.tech_string
+    # ``languages`` is a tuple on the VO; fixtures encode it as a YAML list.
+    # Compare list-to-list so the equality is unambiguous.
+    result["languages"] = list(result.get("languages", ()))

    for field, expected in fixture.expected_parsed.items():
        assert field in result, (
@@ -23,7 +23,7 @@ from unittest.mock import patch
 import pytest

 from alfred.domain.shared.ports import FileEntry
-from alfred.domain.subtitles.entities import SubtitleCandidate
+from alfred.domain.subtitles.entities import SubtitleScanResult
 from alfred.domain.subtitles.services.identifier import (
    SubtitleIdentifier,
    _count_entries,
@@ -310,8 +310,8 @@ class TestSizeDisambiguation:
            detection=TypeDetectionMethod.SIZE_AND_COUNT,
        )

-    def _track(self, lang_code: str, entries: int) -> SubtitleCandidate:
-        return SubtitleCandidate(
+    def _track(self, lang_code: str, entries: int) -> SubtitleScanResult:
+        return SubtitleScanResult(
            language=SubtitleLanguage(code=lang_code, tokens=[lang_code]),
            format=None,
            subtitle_type=SubtitleType.UNKNOWN,
@@ -18,7 +18,7 @@ from __future__ import annotations

 import pytest

-from alfred.domain.subtitles.entities import SubtitleCandidate
+from alfred.domain.subtitles.entities import SubtitleScanResult
 from alfred.domain.subtitles.services.matcher import SubtitleMatcher
 from alfred.domain.subtitles.value_objects import (
    SubtitleFormat,
@@ -40,8 +40,8 @@ def _track(
    stype: SubtitleType = SubtitleType.STANDARD,
    confidence: float = 1.0,
    is_embedded: bool = False,
-) -> SubtitleCandidate:
-    return SubtitleCandidate(
+) -> SubtitleScanResult:
+    return SubtitleScanResult(
        language=lang,
        format=fmt,
        subtitle_type=stype,
@@ -5,9 +5,9 @@ uncovered:

 - ``TestSubtitleFormat`` — extension matching (case-insensitive).
 - ``TestSubtitleLanguage`` — token matching (case-insensitive).
- ``TestSubtitleCandidateDestName`` — ``destination_name`` property:
+- ``TestSubtitleScanResultDestName`` — ``destination_name`` property:
  standard / SDH / forced naming, error on missing language or format.
- ``TestSubtitleCandidateRepr`` — debug repr for embedded vs external.
+- ``TestSubtitleScanResultRepr`` — debug repr for embedded vs external.
 - ``TestMediaSubtitleMetadata`` — ``all_tracks`` / ``total_count`` /
  ``unresolved_tracks``.
 - ``TestAvailableSubtitles`` — utility dedup by (lang, type).
@@ -24,10 +24,11 @@ from pathlib import Path
 import pytest

 from alfred.domain.subtitles.aggregates import SubtitleRuleSet
-from alfred.domain.subtitles.entities import MediaSubtitleMetadata, SubtitleCandidate
+from alfred.domain.subtitles.entities import MediaSubtitleMetadata, SubtitleScanResult
 from alfred.domain.subtitles.services.utils import available_subtitles
 from alfred.domain.subtitles.value_objects import (
    RuleScope,
+    RuleScopeLevel,
    SubtitleFormat,
    SubtitleLanguage,
    SubtitleMatchingRules,
@@ -73,7 +74,7 @@ class TestSubtitleLanguage:


 # --------------------------------------------------------------------------- #
-# SubtitleCandidate                                                                #
+# SubtitleScanResult                                                                #
 # --------------------------------------------------------------------------- #


@@ -81,50 +82,50 @@ SRT = SubtitleFormat(id="srt", extensions=[".srt"])
 FRA = SubtitleLanguage(code="fra", tokens=["fr", "fre"])


-class TestSubtitleCandidateDestName:
+class TestSubtitleScanResultDestName:
    def test_standard(self):
-        t = SubtitleCandidate(
+        t = SubtitleScanResult(
            language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD
        )
        assert t.destination_name == "fra.srt"

    def test_sdh(self):
-        t = SubtitleCandidate(language=FRA, format=SRT, subtitle_type=SubtitleType.SDH)
+        t = SubtitleScanResult(language=FRA, format=SRT, subtitle_type=SubtitleType.SDH)
        assert t.destination_name == "fra.sdh.srt"

    def test_forced(self):
-        t = SubtitleCandidate(
+        t = SubtitleScanResult(
            language=FRA, format=SRT, subtitle_type=SubtitleType.FORCED
        )
        assert t.destination_name == "fra.forced.srt"

    def test_unknown_treated_as_standard(self):
-        t = SubtitleCandidate(
+        t = SubtitleScanResult(
            language=FRA, format=SRT, subtitle_type=SubtitleType.UNKNOWN
        )
        # UNKNOWN doesn't add a suffix → same as standard.
        assert t.destination_name == "fra.srt"

    def test_missing_language_raises(self):
-        t = SubtitleCandidate(language=None, format=SRT)
+        t = SubtitleScanResult(language=None, format=SRT)
        with pytest.raises(ValueError, match="language or format missing"):
            t.destination_name

    def test_missing_format_raises(self):
-        t = SubtitleCandidate(language=FRA, format=None)
+        t = SubtitleScanResult(language=FRA, format=None)
        with pytest.raises(ValueError, match="language or format missing"):
            t.destination_name

    def test_extension_dot_stripped(self):
        # Format extension is ".srt" — leading dot must not be duplicated.
-        t = SubtitleCandidate(language=FRA, format=SRT)
+        t = SubtitleScanResult(language=FRA, format=SRT)
        assert t.destination_name.endswith(".srt")
        assert ".." not in t.destination_name


-class TestSubtitleCandidateRepr:
+class TestSubtitleScanResultRepr:
    def test_embedded_repr(self):
-        t = SubtitleCandidate(
+        t = SubtitleScanResult(
            language=FRA, format=None, is_embedded=True, confidence=1.0
        )
        r = repr(t)
@@ -134,14 +135,14 @@ class TestSubtitleCandidateRepr:
    def test_external_repr_uses_filename(self, tmp_path):
        f = tmp_path / "fr.srt"
        f.write_text("")
-        t = SubtitleCandidate(language=FRA, format=SRT, file_path=f, confidence=0.85)
+        t = SubtitleScanResult(language=FRA, format=SRT, file_path=f, confidence=0.85)
        r = repr(t)
        assert "fra" in r
        assert "fr.srt" in r
        assert "0.85" in r

    def test_unresolved_repr(self):
-        t = SubtitleCandidate(language=None, format=None)
+        t = SubtitleScanResult(language=None, format=None)
        r = repr(t)
        assert "?" in r

@@ -159,8 +160,8 @@ class TestMediaSubtitleMetadata:
        assert m.unresolved_tracks == []

    def test_aggregates_embedded_and_external(self):
-        e = SubtitleCandidate(language=FRA, format=None, is_embedded=True)
-        x = SubtitleCandidate(language=FRA, format=SRT, file_path=Path("/x.srt"))
+        e = SubtitleScanResult(language=FRA, format=None, is_embedded=True)
+        x = SubtitleScanResult(language=FRA, format=SRT, file_path=Path("/x.srt"))
        m = MediaSubtitleMetadata(
            media_id=None,
            media_type="movie",
@@ -173,13 +174,13 @@ class TestMediaSubtitleMetadata:
    def test_unresolved_tracks_only_external_with_none_lang(self):
        # An embedded with None language must NOT appear in unresolved_tracks
        # (the property only iterates external_tracks).
-        embedded_unknown = SubtitleCandidate(
+        embedded_unknown = SubtitleScanResult(
            language=None, format=None, is_embedded=True
        )
-        external_known = SubtitleCandidate(
+        external_known = SubtitleScanResult(
            language=FRA, format=SRT, file_path=Path("/a.srt")
        )
-        external_unknown = SubtitleCandidate(
+        external_unknown = SubtitleScanResult(
            language=None, format=SRT, file_path=Path("/b.srt")
        )
        m = MediaSubtitleMetadata(
@@ -200,14 +201,14 @@ class TestAvailableSubtitles:
    def test_dedup_by_lang_and_type(self):
        ENG = SubtitleLanguage(code="eng", tokens=["en"])
        tracks = [
-            SubtitleCandidate(
+            SubtitleScanResult(
                language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD
            ),
-            SubtitleCandidate(
+            SubtitleScanResult(
                language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD
            ),
-            SubtitleCandidate(language=FRA, format=SRT, subtitle_type=SubtitleType.SDH),
-            SubtitleCandidate(
+            SubtitleScanResult(language=FRA, format=SRT, subtitle_type=SubtitleType.SDH),
+            SubtitleScanResult(
                language=ENG, format=SRT, subtitle_type=SubtitleType.STANDARD
            ),
        ]
@@ -221,10 +222,10 @@ class TestAvailableSubtitles:

    def test_none_language_treated_as_key(self):
        # Tracks with no language form a single None-keyed bucket.
-        t1 = SubtitleCandidate(
+        t1 = SubtitleScanResult(
            language=None, format=SRT, subtitle_type=SubtitleType.UNKNOWN
        )
-        t2 = SubtitleCandidate(
+        t2 = SubtitleScanResult(
            language=None, format=SRT, subtitle_type=SubtitleType.UNKNOWN
        )
        result = available_subtitles([t1, t2])
@@ -257,7 +258,7 @@ class TestSubtitleRuleSet:
    def test_override_partial_keeps_parent_for_unset_fields(self):
        parent = SubtitleRuleSet.global_default()
        child = SubtitleRuleSet(
-            scope=RuleScope(level="show", identifier="tt1"),
+            scope=RuleScope(level=RuleScopeLevel.SHOW, identifier="tt1"),
            parent=parent,
        )
        child.override(languages=["jpn"])
@@ -267,14 +268,14 @@ class TestSubtitleRuleSet:
        assert rules.min_confidence == parent.resolve(_DEFAULT_RULES).min_confidence

    def test_to_dict_only_emits_set_deltas(self):
-        rs = SubtitleRuleSet(scope=RuleScope(level="show", identifier="tt1"))
+        rs = SubtitleRuleSet(scope=RuleScope(level=RuleScopeLevel.SHOW, identifier="tt1"))
        rs.override(languages=["fra"])
        out = rs.to_dict()
        assert out["scope"] == {"level": "show", "identifier": "tt1"}
        assert out["override"] == {"languages": ["fra"]}

    def test_to_dict_full_override(self):
-        rs = SubtitleRuleSet(scope=RuleScope(level="global"))
+        rs = SubtitleRuleSet(scope=RuleScope(level=RuleScopeLevel.GLOBAL))
        rs.override(
            languages=["fra"],
            formats=["srt"],
@@ -39,6 +39,14 @@ class ReleaseFixture:
    def routing(self) -> dict:
        return self.data.get("routing", {})

+    @property
+    def xfail_reason(self) -> str | None:
+        """If set, the fixture is expected to fail — wrapped with
+        ``pytest.mark.xfail`` by the test runner. Used for known
+        not-supported pathological cases (typically PATH OF PAIN bucket).
+        """
+        return self.data.get("xfail_reason")
+
    def materialize(self, root: Path) -> None:
        """Create the fixture's ``tree`` as empty files/dirs under ``root``."""
        for entry in self.tree:
@@ -1,5 +1,10 @@
 release_name: "Deutschland 83-86-89 (2015) Season 1-3 S01-S03 (1080p BluRay x265 HEVC 10bit AAC 5.1 German Kappa)"

+# Out of SHITTY scope by design: parenthesized tech blocks, group name as
+# the last bare word inside parens, year-suffix range in title, dual
+# season expression. PATH OF PAIN handles this via LLM pre-analysis.
+xfail_reason: "PoP-grade pathological franchise box-set, beyond simple-dict SHITTY"
+
 # Pathological franchise box-set:
 # - Title contains year-suffix range "83-86-89" (3 years glued)
 # - Season range expressed twice: "Season 1-3" AND "S01-S03"
@@ -1,13 +1,15 @@
 release_name: "Khruangbin ｜ Austin City Limits Music Festival 2024 ｜ Full Set [V_-7WWPPeBs].webm"

 # yt-dlp slug: UTF-8 wide pipe '｜' (U+FF5C, not the ASCII '|'), trailing
-# YouTube video ID in brackets, .webm extension. Parser extracts the year
-# (2024) correctly but mistakes the YouTube ID '7WWPPeBs' for a release
-# group, and the wide pipe survives the tokenizer (not a separator).
+# YouTube video ID in brackets, .webm extension. The wide pipe survives
+# the tokenizer (not a separator) but is now dropped at title assembly
+# (pure-punctuation TITLE tokens carry no content). Year (2024) parses
+# correctly; the YouTube ID '7WWPPeBs' is still mistaken for a release
+# group (separate gap, see PoP backlog).
 # This is a concert recording — closer to "live music" than "movie", but
 # media_type=movie is the current degenerate best guess.
 parsed:
-  title: "Khruangbin.｜.Austin.City.Limits.Music.Festival"
+  title: "Khruangbin.Austin.City.Limits.Music.Festival"
  year: 2024
  season: null
  episode: null
@@ -1,5 +1,10 @@
 release_name: "Predator Badlands 2025 1080p HDRip HEVC x265 BONE"

+# Space-separated release with both codec aliases present (HEVC + x265)
+# and no dash-before-group. Simple-SHITTY first-wins picks HEVC, expected
+# was x265 (legacy last-wins). Reclassified PoP.
+xfail_reason: "Space-separated, dual codec aliases, no dashed group"
+
 # Space-separated release: tokenizer correctly splits and identifies year +
 # tech, but the dash-before-group convention is absent so 'BONE' is not
 # recognized as the group — falls to UNKNOWN. Anti-regression baseline.
@@ -1,5 +1,9 @@
 release_name: "SLEAFORD MODS   Live Glastonbury June 27th 2015-niNjHn8abyY.mp4"

+# YouTube-style slug with year-prefixed video-id dash suffix. Not a scene
+# release shape at all — PATH OF PAIN.
+xfail_reason: "YouTube slug with year-prefixed video-id, not a scene shape"
+
 # yt-dlp filename: triple space between band name and event, no canonical
 # tech markers, dashed YouTube video ID glued to the year, .mp4 extension
 # preserved in the title. Parser:
@@ -1,5 +1,10 @@
 release_name: "Super Mario Bros. le film [FR-EN] (2023).mkv"

+# Bare-dashed language pair interior to the title (``[FR-EN]``) is tagged
+# as group by ``_detect_group``, leaving the title fragment behind.
+# Out of simple-SHITTY scope.
+xfail_reason: "Interior bare-dashed language pair confuses group detection"
+
 # Hybrid English/French marketing title with:
 # - Trailing period after 'Bros' that is part of the title abbreviation
 #   (not a separator), but tokenizer treats it as one
@@ -1,28 +1,26 @@
 release_name: "The Prodigy World's on Fire 2011 Blu-ray Remux 1080i AVC DTS-HD MA 5.1 - KRaLiMaRKo.mkv"

-# Apocalypse case combining every horror:
-# - Unescaped apostrophe ("World's") → forces parse_path="ai" fallback
-# - Spaces AND dashes used as separators inconsistently
-# - "Blu-ray" with a dash (vs. canonical BluRay)
-# - "1080i" interlaced flag (not 1080p)
-# - "DTS-HD MA 5.1" multi-word audio codec
-# - " - GROUP.mkv" trailing format (space-dash-space before group)
+# Apocalypse case combining every horror — partially tamed by the
+# apostrophe fix. Remaining gaps (still PoP-worthy):
+# - "1080i" interlaced flag (not in quality KB)
+# - "Blu-ray" with a dash (vs. canonical BluRay) — recognized as source
+#   but with the dash form
+# - "DTS-HD MA 5.1" multi-word audio codec — the trailing "HD" leaks
+#   into the group
 # - Trailing .mkv extension survives in title
-# Result: total degeneration — UNKNOWN across the board, title=raw input.
-# Once the apostrophe + multi-word-audio + 1080i are handled this fixture
-# should be revisited. For now: anti-regression of the failure shape.
+# - " - GROUP" trailing format (space-dash-space before group)
 parsed:
-  title: "The Prodigy World's on Fire 2011 Blu-ray Remux 1080i AVC DTS-HD MA 5.1 - KRaLiMaRKo.mkv"
-  year: null
+  title: "The.Prodigy.Worlds.on.Fire"
+  year: 2011
  season: null
  episode: null
  quality: null
-  source: null
-  codec: null
-  group: "UNKNOWN"
-  tech_string: ""
-  media_type: "unknown"
-  parse_path: "ai"
+  source: "Blu-ray"
+  codec: "AVC"
+  group: "HD"
+  tech_string: "Blu-ray.AVC"
+  media_type: "movie"
+  parse_path: "sanitized"
  is_season_pack: false

 tree:
@@ -1,14 +1,13 @@
 release_name: "Archer.S14E09E10E11.1080p.WEB.h264-ETHEL"

-# Tech debt: triple-episode chain (E09E10E11) — current parser captures
-# episode=9 and episode_end=10, but E11 is lost. Anti-regression: lock in
-# the partial behavior so any future improvement is intentional.
+# Triple-episode chain (E09E10E11) — the parser collapses the chain to a
+# range (episode=first, episode_end=last). Intermediate values are implied.
 parsed:
  title: "Archer"
  year: null
  season: 14
  episode: 9
-  episode_end: 10
+  episode_end: 11
  quality: "1080p"
  source: "WEB"
  codec: "h264"
@@ -1,21 +1,22 @@
 release_name: "Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265.EAC3.5.1-Amen"

-# Tech debt: the unescaped apostrophe in "Don't" pushes the whole release
-# through the AI fallback path (parse_path="ai") and the parse degenerates to
-# UNKNOWN across the board. Anti-regression here — once the tokenizer learns
-# to handle apostrophes, this fixture should be revisited.
+# Apostrophes inside titles ("Don't", "L'avare") used to push the release
+# through the AI fallback (parse_path="ai", everything UNKNOWN). They are
+# now pre-stripped before well-formed check and tokenize, so the parse
+# completes normally — only the title text loses its apostrophe
+# ("Honey.Dont").
 parsed:
-  title: "Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265.EAC3.5.1-Amen"
-  year: null
+  title: "Honey.Dont"
+  year: 2025
  season: null
  episode: null
-  quality: null
-  source: null
-  codec: null
-  group: "UNKNOWN"
-  tech_string: ""
-  media_type: "unknown"
-  parse_path: "ai"
+  quality: "2160p"
+  source: "WEBRip"
+  codec: "x265"
+  group: "Amen"
+  tech_string: "2160p.WEBRip.x265"
+  media_type: "movie"
+  parse_path: "sanitized"
  is_season_pack: false

 tree:
@@ -1,7 +1,8 @@
 release_name: "Notre.planete.s01e01.1080p.NF.WEB-DL.DDP5.1.x264-NTb"

 # Lowercase 's01e01' and lowercased title word ('planete') correctly parsed.
-# NF (Netflix) source tag is not in the source KB — drops; WEB-DL wins.
+# NF is the Netflix streaming distributor (separate dimension from source);
+# WEB-DL is the encoding source.
 parsed:
  title: "Notre.planete"
  year: null
@@ -11,6 +12,7 @@ parsed:
  source: "WEB-DL"
  codec: "x264"
  group: "NTb"
+  distributor: "NF"
  tech_string: "1080p.WEB-DL.x264"
  media_type: "tv_show"
  parse_path: "direct"
@@ -1,22 +1,22 @@
 release_name: "Der.Tatortreiniger.S01-06.GERMAN.1080p.WEB.x264-WAYNE"

-# Tech debt: range syntax 'S01-06' is not recognized as TV — falls through
-# to media_type=movie with the range glued onto the title. Captured here so a
-# future ranger-aware parser change is intentional.
+# Range syntax 'S01-06' is now recognized as a season-range marker:
+# season=1 (first of the range), media_type=tv_complete, and the token
+# no longer leaks into the title.
 parsed:
-  title: "Der.Tatortreiniger.S01-06"
+  title: "Der.Tatortreiniger"
  year: null
-  season: null
+  season: 1
  episode: null
  quality: "1080p"
  source: "WEB"
  codec: "x264"
  group: "WAYNE"
  tech_string: "1080p.WEB.x264"
-  media_type: "movie"
+  media_type: "tv_complete"
  languages: ["GERMAN"]
  parse_path: "direct"
-  is_season_pack: false
+  is_season_pack: true

 tree:
  - "Der.Tatortreiniger.S01-06.GERMAN.1080p.WEB.x264-WAYNE/"
@@ -1,11 +1,12 @@
 release_name: "Vinyl - 1x01 - FHD"

-# Tech debt: surrounding ' - ' separators leave a stray '-' token attached
-# to the title ("Vinyl.-"). NxNN form correctly identifies S01E01; everything
-# tech-side empty (no quality token in KB — "FHD" not yet known). Anti-regression
-# the current degenerate title so a future fix is intentional.
+# Surrounding ' - ' separators in human-friendly release names left stray
+# '-' tokens attached to the title. They are now dropped at assembly time
+# (pure-punctuation TITLE tokens carry no content). NxNN form correctly
+# identifies S01E01; tech-side stays empty (no quality token in KB — "FHD"
+# not yet known).
 parsed:
-  title: "Vinyl.-"
+  title: "Vinyl"
  year: null
  season: 1
  episode: 1
@@ -0,0 +1,155 @@
+"""Tests for :class:`FfprobeMediaProber`.
+
+Covers the full-probe path (``probe()`` returning a ``MediaInfo``) by
+patching ``subprocess.run`` at the adapter module level. The
+subtitle-streams path is exercised by the subtitle domain tests via
+the same adapter.
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+from unittest.mock import MagicMock, patch
+
+from alfred.infrastructure.probe import FfprobeMediaProber
+
+_PROBER = FfprobeMediaProber()
+_PATCH_TARGET = "alfred.infrastructure.probe.ffprobe_prober.subprocess.run"
+
+
+def _ffprobe_result(returncode=0, stdout="{}", stderr="") -> MagicMock:
+    return MagicMock(returncode=returncode, stdout=stdout, stderr=stderr)
+
+
+class TestProbe:
+    def test_timeout_returns_none(self, tmp_path):
+        f = tmp_path / "x.mkv"
+        f.write_bytes(b"")
+        with patch(
+            _PATCH_TARGET,
+            side_effect=subprocess.TimeoutExpired(cmd="ffprobe", timeout=30),
+        ):
+            assert _PROBER.probe(f) is None
+
+    def test_nonzero_returncode_returns_none(self, tmp_path):
+        f = tmp_path / "x.mkv"
+        f.write_bytes(b"")
+        with patch(
+            _PATCH_TARGET,
+            return_value=_ffprobe_result(returncode=1, stderr="not a media file"),
+        ):
+            assert _PROBER.probe(f) is None
+
+    def test_invalid_json_returns_none(self, tmp_path):
+        f = tmp_path / "x.mkv"
+        f.write_bytes(b"")
+        with patch(
+            _PATCH_TARGET,
+            return_value=_ffprobe_result(stdout="not json {"),
+        ):
+            assert _PROBER.probe(f) is None
+
+    def test_parses_format_duration_and_bitrate(self, tmp_path):
+        f = tmp_path / "x.mkv"
+        f.write_bytes(b"")
+        payload = {
+            "format": {"duration": "1234.5", "bit_rate": "5000000"},
+            "streams": [],
+        }
+        with patch(
+            _PATCH_TARGET,
+            return_value=_ffprobe_result(stdout=json.dumps(payload)),
+        ):
+            info = _PROBER.probe(f)
+        assert info is not None
+        assert info.duration_seconds == 1234.5
+        assert info.bitrate_kbps == 5000  # bit_rate // 1000
+
+    def test_invalid_numeric_format_fields_skipped(self, tmp_path):
+        f = tmp_path / "x.mkv"
+        f.write_bytes(b"")
+        payload = {
+            "format": {"duration": "garbage", "bit_rate": "also-bad"},
+            "streams": [],
+        }
+        with patch(
+            _PATCH_TARGET,
+            return_value=_ffprobe_result(stdout=json.dumps(payload)),
+        ):
+            info = _PROBER.probe(f)
+        assert info is not None
+        assert info.duration_seconds is None
+        assert info.bitrate_kbps is None
+
+    def test_parses_streams(self, tmp_path):
+        f = tmp_path / "x.mkv"
+        f.write_bytes(b"")
+        payload = {
+            "format": {},
+            "streams": [
+                {
+                    "index": 0,
+                    "codec_type": "video",
+                    "codec_name": "h264",
+                    "width": 1920,
+                    "height": 1080,
+                },
+                {
+                    "index": 1,
+                    "codec_type": "audio",
+                    "codec_name": "ac3",
+                    "channels": 6,
+                    "channel_layout": "5.1",
+                    "tags": {"language": "eng"},
+                    "disposition": {"default": 1},
+                },
+                {
+                    "index": 2,
+                    "codec_type": "audio",
+                    "codec_name": "aac",
+                    "channels": 2,
+                    "tags": {"language": "fra"},
+                },
+                {
+                    "index": 3,
+                    "codec_type": "subtitle",
+                    "codec_name": "subrip",
+                    "tags": {"language": "fra"},
+                    "disposition": {"forced": 1},
+                },
+            ],
+        }
+        with patch(
+            _PATCH_TARGET,
+            return_value=_ffprobe_result(stdout=json.dumps(payload)),
+        ):
+            info = _PROBER.probe(f)
+        assert info.video_codec == "h264"
+        assert info.width == 1920 and info.height == 1080
+        assert len(info.audio_tracks) == 2
+        eng = info.audio_tracks[0]
+        assert eng.language == "eng"
+        assert eng.is_default is True
+        assert info.audio_tracks[1].is_default is False
+        assert len(info.subtitle_tracks) == 1
+        assert info.subtitle_tracks[0].is_forced is True
+
+    def test_first_video_stream_wins(self, tmp_path):
+        # The implementation only fills video_codec on the FIRST video stream.
+        f = tmp_path / "x.mkv"
+        f.write_bytes(b"")
+        payload = {
+            "format": {},
+            "streams": [
+                {"codec_type": "video", "codec_name": "h264", "width": 1920},
+                {"codec_type": "video", "codec_name": "hevc", "width": 3840},
+            ],
+        }
+        with patch(
+            _PATCH_TARGET,
+            return_value=_ffprobe_result(stdout=json.dumps(payload)),
+        ):
+            info = _PROBER.probe(f)
+        assert info.video_codec == "h264"
+        assert info.width == 1920
@@ -1,21 +1,19 @@
 """Tests for the smaller ``alfred.infrastructure.filesystem`` helpers.

-Covers four siblings of ``FileManager`` that had near-zero coverage:
+Covers three siblings of ``FileManager`` that had near-zero coverage:

- ``ffprobe.probe`` — wraps ``ffprobe`` JSON output into a ``MediaInfo``.
 - ``filesystem_operations.create_folder`` / ``move`` — thin
  ``mkdir`` / ``mv`` wrappers returning dict-shaped responses.
 - ``organizer.MediaOrganizer`` — computes destination paths for movies
  and TV episodes; creates folders for them.
 - ``find_video.find_video_file`` — first-video lookup in a folder.

-External commands (``ffprobe`` / ``mv``) are patched via ``subprocess.run``.
+(``ffprobe`` coverage now lives in ``test_ffprobe_prober.py`` alongside
+its adapter.)
 """

 from __future__ import annotations

-import json
-import subprocess
 from unittest.mock import MagicMock, patch

 from alfred.domain.movies.entities import Movie
@@ -27,154 +25,15 @@ from alfred.domain.tv_shows.value_objects import (
    SeasonNumber,
    ShowStatus,
 )
-from alfred.infrastructure.filesystem import ffprobe
 from alfred.infrastructure.filesystem.filesystem_operations import (
    create_folder,
    move,
 )
 from alfred.infrastructure.filesystem.find_video import find_video_file
 from alfred.infrastructure.filesystem.organizer import MediaOrganizer
+from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge

-# --------------------------------------------------------------------------- #
-# ffprobe.probe                                                                #
-# --------------------------------------------------------------------------- #
-
-
-def _ffprobe_result(returncode=0, stdout="{}", stderr="") -> MagicMock:
-    return MagicMock(returncode=returncode, stdout=stdout, stderr=stderr)
-
-
-class TestFfprobe:
-    def test_timeout_returns_none(self, tmp_path):
-        f = tmp_path / "x.mkv"
-        f.write_bytes(b"")
-        with patch(
-            "alfred.infrastructure.filesystem.ffprobe.subprocess.run",
-            side_effect=subprocess.TimeoutExpired(cmd="ffprobe", timeout=30),
-        ):
-            assert ffprobe.probe(f) is None
-
-    def test_nonzero_returncode_returns_none(self, tmp_path):
-        f = tmp_path / "x.mkv"
-        f.write_bytes(b"")
-        with patch(
-            "alfred.infrastructure.filesystem.ffprobe.subprocess.run",
-            return_value=_ffprobe_result(returncode=1, stderr="not a media file"),
-        ):
-            assert ffprobe.probe(f) is None
-
-    def test_invalid_json_returns_none(self, tmp_path):
-        f = tmp_path / "x.mkv"
-        f.write_bytes(b"")
-        with patch(
-            "alfred.infrastructure.filesystem.ffprobe.subprocess.run",
-            return_value=_ffprobe_result(stdout="not json {"),
-        ):
-            assert ffprobe.probe(f) is None
-
-    def test_parses_format_duration_and_bitrate(self, tmp_path):
-        f = tmp_path / "x.mkv"
-        f.write_bytes(b"")
-        payload = {
-            "format": {"duration": "1234.5", "bit_rate": "5000000"},
-            "streams": [],
-        }
-        with patch(
-            "alfred.infrastructure.filesystem.ffprobe.subprocess.run",
-            return_value=_ffprobe_result(stdout=json.dumps(payload)),
-        ):
-            info = ffprobe.probe(f)
-        assert info is not None
-        assert info.duration_seconds == 1234.5
-        assert info.bitrate_kbps == 5000  # bit_rate // 1000
-
-    def test_invalid_numeric_format_fields_skipped(self, tmp_path):
-        f = tmp_path / "x.mkv"
-        f.write_bytes(b"")
-        payload = {
-            "format": {"duration": "garbage", "bit_rate": "also-bad"},
-            "streams": [],
-        }
-        with patch(
-            "alfred.infrastructure.filesystem.ffprobe.subprocess.run",
-            return_value=_ffprobe_result(stdout=json.dumps(payload)),
-        ):
-            info = ffprobe.probe(f)
-        assert info is not None
-        assert info.duration_seconds is None
-        assert info.bitrate_kbps is None
-
-    def test_parses_streams(self, tmp_path):
-        f = tmp_path / "x.mkv"
-        f.write_bytes(b"")
-        payload = {
-            "format": {},
-            "streams": [
-                {
-                    "index": 0,
-                    "codec_type": "video",
-                    "codec_name": "h264",
-                    "width": 1920,
-                    "height": 1080,
-                },
-                {
-                    "index": 1,
-                    "codec_type": "audio",
-                    "codec_name": "ac3",
-                    "channels": 6,
-                    "channel_layout": "5.1",
-                    "tags": {"language": "eng"},
-                    "disposition": {"default": 1},
-                },
-                {
-                    "index": 2,
-                    "codec_type": "audio",
-                    "codec_name": "aac",
-                    "channels": 2,
-                    "tags": {"language": "fra"},
-                },
-                {
-                    "index": 3,
-                    "codec_type": "subtitle",
-                    "codec_name": "subrip",
-                    "tags": {"language": "fra"},
-                    "disposition": {"forced": 1},
-                },
-            ],
-        }
-        with patch(
-            "alfred.infrastructure.filesystem.ffprobe.subprocess.run",
-            return_value=_ffprobe_result(stdout=json.dumps(payload)),
-        ):
-            info = ffprobe.probe(f)
-        assert info.video_codec == "h264"
-        assert info.width == 1920 and info.height == 1080
-        assert len(info.audio_tracks) == 2
-        eng = info.audio_tracks[0]
-        assert eng.language == "eng"
-        assert eng.is_default is True
-        assert info.audio_tracks[1].is_default is False
-        assert len(info.subtitle_tracks) == 1
-        assert info.subtitle_tracks[0].is_forced is True
-
-    def test_first_video_stream_wins(self, tmp_path):
-        # The implementation only fills video_codec on the FIRST video stream.
-        f = tmp_path / "x.mkv"
-        f.write_bytes(b"")
-        payload = {
-            "format": {},
-            "streams": [
-                {"codec_type": "video", "codec_name": "h264", "width": 1920},
-                {"codec_type": "video", "codec_name": "hevc", "width": 3840},
-            ],
-        }
-        with patch(
-            "alfred.infrastructure.filesystem.ffprobe.subprocess.run",
-            return_value=_ffprobe_result(stdout=json.dumps(payload)),
-        ):
-            info = ffprobe.probe(f)
-        assert info.video_codec == "h264"
-        assert info.width == 1920
+_KB = YamlReleaseKnowledge()


 # --------------------------------------------------------------------------- #
@@ -263,35 +122,35 @@ class TestFindVideo:
    def test_returns_file_directly_when_video(self, tmp_path):
        f = tmp_path / "Movie.mkv"
        f.write_bytes(b"")
-        assert find_video_file(f) == f
+        assert find_video_file(f, _KB) == f

    def test_returns_none_when_file_is_not_video(self, tmp_path):
        f = tmp_path / "notes.txt"
        f.write_text("x")
-        assert find_video_file(f) is None
+        assert find_video_file(f, _KB) is None

    def test_returns_none_when_folder_has_no_video(self, tmp_path):
        (tmp_path / "a.txt").write_text("x")
-        assert find_video_file(tmp_path) is None
+        assert find_video_file(tmp_path, _KB) is None

    def test_returns_first_sorted_video(self, tmp_path):
        (tmp_path / "B.mkv").write_bytes(b"")
        (tmp_path / "A.mkv").write_bytes(b"")
        (tmp_path / "C.mkv").write_bytes(b"")
-        found = find_video_file(tmp_path)
+        found = find_video_file(tmp_path, _KB)
        assert found.name == "A.mkv"

    def test_recurses_into_subfolders(self, tmp_path):
        sub = tmp_path / "sub"
        sub.mkdir()
        (sub / "X.mkv").write_bytes(b"")
-        found = find_video_file(tmp_path)
+        found = find_video_file(tmp_path, _KB)
        assert found is not None and found.name == "X.mkv"

    def test_case_insensitive_extension(self, tmp_path):
        f = tmp_path / "Movie.MKV"
        f.write_bytes(b"")
-        assert find_video_file(f) == f
+        assert find_video_file(f, _KB) == f


 # --------------------------------------------------------------------------- #
@@ -0,0 +1,82 @@
+"""Tests for ``LanguageRegistry`` — the YAML-backed adapter for the
+:class:`alfred.domain.shared.ports.LanguageRepository` port.
+
+The port is structural (Protocol), so the assertion that the adapter
+satisfies it is a static one — we exercise the public surface here and
+let mypy / runtime polymorphism do the rest.
+"""
+
+from __future__ import annotations
+
+from alfred.domain.shared.ports import LanguageRepository
+from alfred.domain.shared.value_objects import Language
+from alfred.infrastructure.knowledge.language_registry import LanguageRegistry
+
+
+def _registry() -> LanguageRepository:
+    """Return a fresh registry typed as the port — proves structural fit."""
+    return LanguageRegistry()
+
+
+class TestPortSurface:
+    def test_satisfies_protocol(self):
+        # If LanguageRegistry diverged from LanguageRepository, the annotation
+        # below would already be wrong at type-check time; at runtime, this
+        # just confirms the methods exist.
+        reg: LanguageRepository = LanguageRegistry()
+        assert hasattr(reg, "from_iso")
+        assert hasattr(reg, "from_any")
+        assert hasattr(reg, "all")
+
+    def test_len_reflects_loaded_entries(self):
+        reg = _registry()
+        # The builtin YAML ships dozens of languages — exact count drifts
+        # with knowledge updates, so just sanity-check it's non-empty.
+        assert len(reg) > 0
+
+
+class TestFromIso:
+    def test_known_iso_returns_language(self):
+        reg = _registry()
+        fre = reg.from_iso("fre")
+        assert isinstance(fre, Language)
+        assert fre.iso == "fre"
+
+    def test_case_insensitive(self):
+        reg = _registry()
+        assert reg.from_iso("FRE") == reg.from_iso("fre")
+
+    def test_unknown_iso_returns_none(self):
+        assert _registry().from_iso("zzz") is None
+
+    def test_non_string_returns_none(self):
+        assert _registry().from_iso(None) is None  # type: ignore[arg-type]
+
+
+class TestFromAny:
+    def test_english_name(self):
+        reg = _registry()
+        lang = reg.from_any("French")
+        assert lang is not None
+        assert lang.iso == "fre"
+
+    def test_iso_639_1_alias(self):
+        # "fr" is the 639-1 form, registered as an alias.
+        reg = _registry()
+        lang = reg.from_any("fr")
+        assert lang is not None
+        assert lang.iso == "fre"
+
+    def test_unknown_returns_none(self):
+        assert _registry().from_any("vostfr") is None
+
+    def test_non_string_returns_none(self):
+        assert _registry().from_any(123) is None  # type: ignore[arg-type]
+
+
+class TestMembership:
+    def test_contains_known(self):
+        assert "english" in _registry()
+
+    def test_does_not_contain_unknown(self):
+        assert "klingon" not in _registry()
@@ -16,7 +16,7 @@ from __future__ import annotations

 from pathlib import Path

-from alfred.domain.subtitles.entities import SubtitleCandidate
+from alfred.domain.subtitles.entities import SubtitleScanResult
 from alfred.application.subtitles.placer import PlacedTrack
 from alfred.domain.subtitles.value_objects import (
    SubtitleFormat,
@@ -32,8 +32,8 @@ ENG = SubtitleLanguage(code="eng", tokens=["en"])

 def _track(
    lang=FRA, *, embedded: bool = False, confidence: float = 0.92
-) -> SubtitleCandidate:
-    return SubtitleCandidate(
+) -> SubtitleScanResult:
+    return SubtitleScanResult(
        language=lang,
        format=SRT,
        subtitle_type=SubtitleType.STANDARD,