Compare commits
45 Commits
9f10f4e0ad
..
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 02e478a157 | |||
| 3dc73a5214 | |||
| 88f156b7a4 | |||
| 5107cb32c0 | |||
| b7979c0f8b | |||
| 9f1ce94690 | |||
| 5e0ed11672 | |||
| 0246f85ef8 | |||
| e62dc90bd1 | |||
| 688c37bbec | |||
| 757e4045ee | |||
| c3767aacb6 | |||
| 5bcf22b408 | |||
| cfa9f54d9f | |||
| f0aaf50c97 | |||
| a09262b33f | |||
| 9c7cd66d2b | |||
| 83dbed887b | |||
| 0c9489e16b | |||
| 621bb96995 | |||
| 448ef3b79c | |||
| b1c7f35ffb | |||
| 5bbdc9081f | |||
| 5d7b214af2 | |||
| 18267d0165 | |||
| 19fe8a519a | |||
| a0d1846ff2 | |||
| 0fb59a4581 | |||
| e79ca462b8 | |||
| 03aa844d7d | |||
| c303efea48 | |||
| 5db350a1df | |||
| 12dc796ea2 | |||
| 9ddd85929e | |||
| ed7680b58f | |||
| b4c9efd13b | |||
| 98c688f29b | |||
| fcd80763e2 | |||
| 629387591f | |||
| 230a7ab88a | |||
| 3737f66851 | |||
| fd3bd1ad8c | |||
| 7dc7f0c241 | |||
| 075a827b0e | |||
| a2c917618f |
+380
@@ -15,8 +15,372 @@ callers).
|
|||||||
|
|
||||||
## [Unreleased]
|
## [Unreleased]
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
|
||||||
|
- **Multi-episode chain (e.g. `S14E09E10E11`) now collapses to a full
|
||||||
|
range.** The parser previously captured `episode=9, episode_end=10`
|
||||||
|
and dropped E11+. It now returns `episode=first, episode_end=last`,
|
||||||
|
with intermediate values implied. Fixture
|
||||||
|
`shitty/archer_multi_episode/` updated from anti-regression-of-bug
|
||||||
|
to anti-regression-of-fix.
|
||||||
|
- **Apostrophes in titles no longer push the release through the AI
|
||||||
|
fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen`
|
||||||
|
previously parsed with `parse_path="ai"` and everything UNKNOWN
|
||||||
|
because `'` is in the forbidden-chars list. Apostrophes are now
|
||||||
|
pre-stripped before the well-formed check, so the parse completes
|
||||||
|
normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only
|
||||||
|
the title text loses its apostrophe. `parse_path` becomes
|
||||||
|
`sanitized` to surface the cleanup. Side win: PoP fixture
|
||||||
|
`the_prodigy_full_chaos/` also moves from total failure to a
|
||||||
|
partially-correct parse (year, source, codec extracted).
|
||||||
|
- **Season-range markers (`Sxx-yy`) are now recognized as
|
||||||
|
`tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously
|
||||||
|
parsed as `media_type=movie` with `S01-06` glued onto the title.
|
||||||
|
The parser now recognizes the range, sets `season=first`,
|
||||||
|
`media_type=tv_complete`, and removes the marker from the title.
|
||||||
|
`is_season_pack` flips to `true`.
|
||||||
|
- **Pure-punctuation TITLE tokens are dropped at assembly.** Releases
|
||||||
|
with surrounding ` - ` separators (`Vinyl - 1x01 - FHD`) previously
|
||||||
|
produced `title="Vinyl.-"`. Such tokens (a stray dash, a wide pipe
|
||||||
|
`|`, …) carry no title content and are now filtered out. Side
|
||||||
|
effect: PoP fixture `khruangbin_yt_wide_pipe/` also benefits — the
|
||||||
|
YouTube wide-pipe no longer leaks into the title.
|
||||||
|
|
||||||
### Added
|
### Added
|
||||||
|
|
||||||
|
- **Fullwidth vertical bar `|` (U+FF5C) is now a recognized release-name
|
||||||
|
token separator.** Added to `alfred/knowledge/release/separators.yaml`
|
||||||
|
so CJK release names (and the occasional decorative YouTube-style use)
|
||||||
|
tokenize cleanly instead of leaving the wide pipe glued onto an
|
||||||
|
adjacent token. The tokenizer in
|
||||||
|
`alfred/domain/release/parser/pipeline.py` already iterates the
|
||||||
|
separator list as plain strings (no regex), so a multi-byte UTF-8
|
||||||
|
separator works without any code change.
|
||||||
|
|
||||||
|
- **`InspectedResult.recommended_action` property** — derived hint that
|
||||||
|
collapses the orchestrator's go / wait / skip decision into a single
|
||||||
|
value (``"process"`` / ``"ask_user"`` / ``"skip"``). Centralizes the
|
||||||
|
exclusion logic that was previously dispersed across road /
|
||||||
|
media_type / main_video checks at each call site. Ordering is part of
|
||||||
|
the contract: ``skip`` (no main video, or media_type == ``"other"``)
|
||||||
|
wins over ``ask_user`` (media_type == ``"unknown"`` or road ==
|
||||||
|
``"path_of_pain"``) which wins over ``process``. Surfaced through the
|
||||||
|
``analyze_release`` tool so the LLM can route on it directly.
|
||||||
|
6 new tests in ``tests/application/test_inspect.py`` cover the four
|
||||||
|
branches and the precedence rules.
|
||||||
|
- **`LanguageRepository` port** in `alfred.domain.shared.ports`. Structural
|
||||||
|
Protocol covering `from_iso`, `from_any`, `all`, `__contains__`, `__len__`
|
||||||
|
— the surface previously coupled to the concrete `LanguageRegistry`.
|
||||||
|
Mirrors the `MediaProber` / `FilesystemScanner` pattern: domain code
|
||||||
|
depends on the Protocol, infrastructure provides the YAML-backed
|
||||||
|
adapter. Tests in `tests/infrastructure/test_language_registry.py`.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **`Movie` and `Episode` are now frozen dataclasses.** Both entities
|
||||||
|
hold their track collections as `tuple[AudioTrack, ...]` and
|
||||||
|
`tuple[SubtitleTrack, ...]` instead of mutable lists, and are
|
||||||
|
`@dataclass(frozen=True, eq=False)` (identity-based equality
|
||||||
|
preserved via `__eq__`/`__hash__`). `__post_init__` coercion uses
|
||||||
|
`object.__setattr__` for the `imdb_id` / `title` /
|
||||||
|
`season_number` / `episode_number` normalizations. To project
|
||||||
|
enrichment results (probe output, file metadata) callers now rebuild
|
||||||
|
via `dataclasses.replace(...)`. Pattern aligned with the recent
|
||||||
|
`ParsedRelease` freeze. `MediaWithTracks` mixin contract updated to
|
||||||
|
`tuple` accordingly. `Season` and `TVShow` remain mutable for now —
|
||||||
|
freezing the aggregate root would cascade a full reconstruction on
|
||||||
|
every `add_episode`, deferred.
|
||||||
|
- **`SubtitleCandidate` renamed to `SubtitleScanResult`.** The old name
|
||||||
|
conflated "this might become a placed subtitle" with "this is what a
|
||||||
|
scan pass produced". The class is the output of a scan/identify pass
|
||||||
|
— language/format may still be `None`, confidence reflects how sure
|
||||||
|
the classifier is, and `raw_tokens` holds the filename fragments
|
||||||
|
under analysis. `SubtitleScanResult` says that directly. Pure rename
|
||||||
|
with a refreshed docstring in `alfred/domain/subtitles/entities.py`;
|
||||||
|
no behavior change. Touches the domain entity + `__init__` export,
|
||||||
|
the matcher / identifier / utils services, the manage_subtitles use
|
||||||
|
case, the placer, the metadata store, the shared-media cross-ref
|
||||||
|
comment, and the seven test modules that imported the type.
|
||||||
|
|
||||||
|
- **`ParsedRelease` is now frozen; enrichment passes return new
|
||||||
|
instances.** The VO was mutable so `detect_media_type` and
|
||||||
|
`enrich_from_probe` could patch fields in place — a code smell in a
|
||||||
|
value object whose identity *is* its content. `ParsedRelease` is now
|
||||||
|
`@dataclass(frozen=True)`; `languages` is a `tuple[str, ...]`
|
||||||
|
instead of a `list[str]`. `enrich_from_probe` returns a new
|
||||||
|
`ParsedRelease` via `dataclasses.replace` (only allocates when at
|
||||||
|
least one field actually changed). `inspect_release` rebinds
|
||||||
|
`parsed` after both `detect_media_type` (wrapped in `MediaTypeToken`
|
||||||
|
to satisfy the strict isinstance check that now also runs on
|
||||||
|
replace) and `enrich_from_probe`. Parser pipeline now packs
|
||||||
|
`languages` as a tuple in the assemble dict. Callers updated:
|
||||||
|
`inspect_release`, `testing/recognize_folders_in_downloads.py`, and
|
||||||
|
the enrichment tests (22 call sites + language assertions switched
|
||||||
|
to tuple literals).
|
||||||
|
- **`resolve_destination` use cases take `kb` / `prober` as required
|
||||||
|
params; module-level singletons gone.** The four
|
||||||
|
`resolve_{season,episode,movie,series}_destination` use cases now
|
||||||
|
accept `kb: ReleaseKnowledge` and `prober: MediaProber` as required
|
||||||
|
arguments, matching the shape of `inspect_release`. The module-level
|
||||||
|
`_KB = YamlReleaseKnowledge()` and `_PROBER = FfprobeMediaProber()`
|
||||||
|
singletons that previously lived in
|
||||||
|
`alfred/application/filesystem/resolve_destination.py` are removed —
|
||||||
|
the application layer no longer reaches into infrastructure. The
|
||||||
|
singletons now live at the agent-tools frontier
|
||||||
|
(`alfred/agent/tools/filesystem.py`), where the LLM-facing wrappers
|
||||||
|
instantiate them once and thread them through. `analyze_release` no
|
||||||
|
longer needs the dirty `from ... import _KB` indirection. Tests
|
||||||
|
inject their own stubs by keyword (`prober=_StubProber(...)`) instead
|
||||||
|
of monkeypatching a module attribute.
|
||||||
|
- **`ParsePath` enum renamed to `TokenizationRoute`.** The old name
|
||||||
|
collided with `pathlib.Path` in code-reading mental models, and was
|
||||||
|
one letter from `parse_path` (the field that holds the value) — making
|
||||||
|
it harder than it needed to be to spot the type vs the attribute.
|
||||||
|
``TokenizationRoute`` says what it actually captures (DIRECT /
|
||||||
|
SANITIZED / AI = how the name reached the tokenizer), and the class
|
||||||
|
docstring now spells out the orthogonality with ``Road`` (EASY /
|
||||||
|
SHITTY / PATH_OF_PAIN, which captures parser confidence on
|
||||||
|
``ParseReport``). The ``parse_path`` field name stays unchanged —
|
||||||
|
string values too — so YAML fixtures, the ``analyze_release`` tool
|
||||||
|
spec, and any external consumer are untouched.
|
||||||
|
- **`enrich_from_probe` codec mappings moved to YAML.** The three
|
||||||
|
hard-coded module dicts (`_VIDEO_CODEC_MAP`, `_AUDIO_CODEC_MAP`,
|
||||||
|
`_CHANNEL_MAP`) translating ffprobe output to scene tokens
|
||||||
|
(`hevc → x265`, `eac3 → EAC3`, `8 → "7.1"`, …) now live in
|
||||||
|
`alfred/knowledge/release/probe_mappings.yaml` and are loaded into
|
||||||
|
`ReleaseKnowledge.probe_mappings` (new port field, populated by
|
||||||
|
`YamlReleaseKnowledge`). `enrich_from_probe` gains a third `kb`
|
||||||
|
parameter and reads the maps from there. Aligns with the CLAUDE.md
|
||||||
|
rule that lookup tables of domain knowledge belong in YAML, not in
|
||||||
|
Python — and opens the door to a future "learn new codec" pass.
|
||||||
|
Callers updated: `inspect_release`, `testing/recognize_folders_in_downloads.py`,
|
||||||
|
and all 22 sites in `tests/application/test_enrich_from_probe.py`.
|
||||||
|
- **`ParsedRelease.tech_string` is now a derived `@property`**
|
||||||
|
(`alfred/domain/release/value_objects.py`). It computes
|
||||||
|
`quality.source.codec` joined by dots on every access, so it stays in
|
||||||
|
sync with the underlying fields by construction. The stored field is
|
||||||
|
gone from the dataclass, the dict returned by `assemble()` no longer
|
||||||
|
carries the key, `parse_release`'s malformed-name fallback drops the
|
||||||
|
`tech_string=""` kwarg, and `enrich_from_probe` no longer re-derives
|
||||||
|
it after filling `quality`/`source`/`codec`. Closes the
|
||||||
|
parser/enrichment double-source-of-truth that `e79ca46` had to fix
|
||||||
|
reactively. The fixtures runner now injects `tech_string` alongside
|
||||||
|
`is_season_pack` since `asdict()` skips properties.
|
||||||
|
- **`RuleScope.level` is now an enum (`RuleScopeLevel`).** The set of
|
||||||
|
valid levels (global, release_group, movie, show, season, episode)
|
||||||
|
was documented only in a docstring comment and validated nowhere.
|
||||||
|
`RuleScopeLevel(str, Enum)` keeps wire compatibility (YAML
|
||||||
|
serialization, `.value` access) while making the closed set explicit
|
||||||
|
to type-checkers and IDEs. `to_dict()` emits `.value` strings so
|
||||||
|
YAML output is unchanged.
|
||||||
|
- **`FilePath` VO uses `__post_init__` instead of a hand-rolled
|
||||||
|
`__init__`.** Same public API (accepts `str | Path`), same behavior,
|
||||||
|
but the dataclass-generated `__init__` is no longer bypassed. One
|
||||||
|
less smell in the shared VOs.
|
||||||
|
- **`Language` VO is strict by default; `Language.from_raw()` factory
|
||||||
|
for normalization.** The previous `__post_init__` mutated `iso` and
|
||||||
|
`aliases` via `object.__setattr__` on a frozen dataclass — a code
|
||||||
|
smell hiding behind the dataclass facade. Split: the direct
|
||||||
|
constructor now rejects un-normalized input (uppercase iso,
|
||||||
|
whitespace in aliases, etc.), and `Language.from_raw()` handles
|
||||||
|
arbitrary YAML/user input. Only one caller (LanguageRegistry loading
|
||||||
|
the ISO YAML) needed migration.
|
||||||
|
- **`ParsedRelease.normalised` renamed to `clean`.** The field name
|
||||||
|
promised "dots instead of spaces" but in practice held
|
||||||
|
`raw - site_tag - apostrophes` — only used by `season_folder_name()`.
|
||||||
|
Renamed and docstring corrected.
|
||||||
|
- **`ParsedRelease.media_type` / `parse_path` are strict enums.** The
|
||||||
|
fields were already typed as `MediaTypeToken` / `ParsePath`, but a
|
||||||
|
tolerant `__post_init__` coerced raw strings. With both classes
|
||||||
|
being `(str, Enum)`, the coercion served no purpose. Strict
|
||||||
|
constructor; `.value` no longer passed at call sites; dropped the
|
||||||
|
unused `_VALID_MEDIA_TYPES` / `_VALID_PARSE_PATHS` lookup tables.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
|
||||||
|
- **`settings.min_movie_size_bytes`** — orphan Pydantic field +
|
||||||
|
validator. Its only consumer (`MovieService.validate_movie_file`)
|
||||||
|
had been removed during an earlier refactor. The "real movie vs
|
||||||
|
sample" rule now lives in extension-based exclusion
|
||||||
|
(`application/release/supported_media.py`) and PoP. If a size
|
||||||
|
threshold is ever needed, it'll go in a knowledge YAML, not in
|
||||||
|
`settings`.
|
||||||
|
|
||||||
|
### Internal
|
||||||
|
|
||||||
|
- **Flattened `alfred.domain.shared.media/` package into a single
|
||||||
|
`media.py` module.** The 6-file package (audio, video, subtitle,
|
||||||
|
info, matching, tracks_mixin + `__init__`) collapsed into one ~250
|
||||||
|
LoC module. All 12 import sites continue to resolve unchanged
|
||||||
|
(`from alfred.domain.shared.media import AudioTrack, MediaInfo, …`)
|
||||||
|
since Python treats `media.py` and `media/__init__.py`
|
||||||
|
interchangeably for import paths. Easier to scan when the whole
|
||||||
|
bounded-context fits on one screen.
|
||||||
|
- **`SubtitleKnowledgeBase` types `language_registry` against the
|
||||||
|
`LanguageRepository` port** instead of the concrete `LanguageRegistry`
|
||||||
|
class. The default constructor still instantiates the concrete adapter
|
||||||
|
when no repository is injected — behaviour is unchanged for existing
|
||||||
|
callers. Opens the door to in-memory fakes in future tests without
|
||||||
|
loading the full ISO 639 YAML.
|
||||||
|
- **Moved `detect_media_type` and `enrich_from_probe` from
|
||||||
|
`alfred.application.filesystem` to `alfred.application.release`**.
|
||||||
|
They are inspection-pipeline helpers — their natural home is next to
|
||||||
|
`inspect_release`, not next to the filesystem use cases. The move
|
||||||
|
also eliminates a circular-import workaround in
|
||||||
|
`resolve_destination.py`: `inspect_release` can now be imported at
|
||||||
|
module top instead of lazily inside `_resolve_parsed`. Public
|
||||||
|
surface is unchanged for callers that imported the helpers from
|
||||||
|
their full module paths (the only call sites — `inspect.py`, two
|
||||||
|
tests, one testing script — were updated in this commit).
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **`resolve_*_destination` use cases now consume `inspect_release`**.
|
||||||
|
`resolve_episode_destination` and `resolve_movie_destination` reuse
|
||||||
|
their existing `source_file` parameter as the inspection target;
|
||||||
|
`resolve_season_destination` and `resolve_series_destination` gain
|
||||||
|
a new **optional** `source_path` parameter (also threaded through
|
||||||
|
the tool wrappers and YAML specs). When the path exists, ffprobe
|
||||||
|
data fills tokens missing from the release name (e.g. quality) and
|
||||||
|
refreshes `tech_string`, so the destination folder / file names
|
||||||
|
end up more accurate. When the path is missing or absent (back-compat
|
||||||
|
callers), the use cases fall back to parse-only — same behavior as
|
||||||
|
before.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
|
||||||
|
- **`enrich_from_probe` now refreshes `tech_string`** after filling
|
||||||
|
`quality` / `source` / `codec`. Previously the field stayed at its
|
||||||
|
parser-time value, so filename builders saw stale tech tokens even
|
||||||
|
after a successful probe. New `TestTechString` class in
|
||||||
|
`tests/application/test_enrich_from_probe.py` locks the behavior.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **`inspect_release` orchestrator + `InspectedResult` VO**
|
||||||
|
(`alfred/application/release/inspect.py`). Single composition of the
|
||||||
|
four inspection layers: `parse_release` → `detect_media_type` (patches
|
||||||
|
`parsed.media_type`) → `find_main_video` (top-level scan) →
|
||||||
|
`prober.probe` + `enrich_from_probe` when a video exists and the
|
||||||
|
refined media type isn't in `{"unknown", "other"}`. Returns a frozen
|
||||||
|
`InspectedResult(parsed, report, source_path, main_video, media_info,
|
||||||
|
probe_used)` that downstream callers consume directly instead of
|
||||||
|
rebuilding the same chain. `kb` and `prober` are injected — no
|
||||||
|
module-level singletons. Never raises.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **`analyze_release` tool now delegates to `inspect_release`** — same
|
||||||
|
output shape, plus two new fields: `confidence` (0–100) and `road`
|
||||||
|
(`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's
|
||||||
|
`ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents
|
||||||
|
both fields so the LLM can route releases by confidence.
|
||||||
|
|
||||||
|
- **`MediaProber` port now covers full media probing**: added
|
||||||
|
`probe(video) -> MediaInfo | None` alongside the existing
|
||||||
|
`list_subtitle_streams`. `FfprobeMediaProber` (in
|
||||||
|
`alfred/infrastructure/probe/`) implements both methods and is now
|
||||||
|
the single adapter shelling out to `ffprobe`. The standalone
|
||||||
|
`alfred/infrastructure/filesystem/ffprobe.py` module was removed —
|
||||||
|
all callers (tools, testing scripts) instantiate
|
||||||
|
`FfprobeMediaProber` instead. Unblocks the upcoming
|
||||||
|
`inspect_release` orchestrator, which depends on the port.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
|
||||||
|
- `alfred/infrastructure/filesystem/ffprobe.py` (folded into the
|
||||||
|
`FfprobeMediaProber` adapter).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [2026-05-20] — Release parser confidence scoring + exclusion
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`):
|
||||||
|
`is_supported_video(path, kb)` (extension-only check against
|
||||||
|
`kb.video_extensions`) and `find_main_video(folder, kb)` (top-level
|
||||||
|
scan, lexicographically-first eligible file, returns `None` when no
|
||||||
|
video qualifies; accepts a bare file as folder for single-file
|
||||||
|
releases). No size threshold, no filename heuristics —
|
||||||
|
PATH_OF_PAIN handles the exotic cases. Foundation for the future
|
||||||
|
`inspect_release` orchestrator.
|
||||||
|
|
||||||
|
- **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`,
|
||||||
|
`alfred/knowledge/release/scoring.yaml`). `parse_release` now returns
|
||||||
|
`(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO
|
||||||
|
carries a 0–100 `confidence`, a `road` (`"easy"` / `"shitty"` /
|
||||||
|
`"path_of_pain"`), the residual UNKNOWN tokens, and the missing
|
||||||
|
critical fields. EASY is decided structurally (a group schema
|
||||||
|
matched); SHITTY vs PATH_OF_PAIN is decided by score against a
|
||||||
|
YAML-configurable cutoff (default 60). Weights and penalties also
|
||||||
|
live in `scoring.yaml` — title 30, media_type 20, year 15, season
|
||||||
|
10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at
|
||||||
|
-30. `Road` is a new enum, distinct from `ParsePath` (which records
|
||||||
|
the tokenization route, not the confidence tier). `ReleaseKnowledge`
|
||||||
|
port gains a `scoring: dict` field.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease,
|
||||||
|
ParseReport]` instead of returning a bare `ParsedRelease`. Call
|
||||||
|
sites updated in `application/filesystem/resolve_destination.py` and
|
||||||
|
`agent/tools/filesystem.py`. Tests updated accordingly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [2026-05-20] — Release parser v2 (EASY + SHITTY)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
|
||||||
|
new annotate-based pipeline (tokenize → annotate → assemble) drives
|
||||||
|
releases from known groups. Exposes `Token` (frozen VO with `index` +
|
||||||
|
`role` + `extra`), `TokenRole` enum (structural/technical/meta families),
|
||||||
|
and `GroupSchema` / `SchemaChunk` value objects.
|
||||||
|
- `pipeline.tokenize`: string-ops separator split (no regex), strips
|
||||||
|
a `[site.tag]` prefix/suffix first.
|
||||||
|
- `pipeline.annotate`: detects the trailing group right-to-left
|
||||||
|
(priority to `codec-GROUP` shape, fallback to any non-source dashed
|
||||||
|
token), looks up its `GroupSchema`, then walks tokens and schema
|
||||||
|
chunks in lockstep — optional chunks that don't match are skipped,
|
||||||
|
mandatory mismatches abort EASY and return `None` so the caller can
|
||||||
|
fall back to SHITTY.
|
||||||
|
- `pipeline.assemble`: folds annotated tokens into a
|
||||||
|
`ParsedRelease`-compatible dict.
|
||||||
|
- `parse_release` (in `release.services`) tries the v2 EASY path first
|
||||||
|
and falls through to the legacy SHITTY heuristic on `None`. Legacy
|
||||||
|
SHITTY/PATH OF PAIN behavior is unchanged.
|
||||||
|
- Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
|
||||||
|
rarbg}.yaml` declare the canonical chunk order per group, loaded via
|
||||||
|
new `ReleaseKnowledge.group_schema(name)` port method.
|
||||||
|
- Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
|
||||||
|
cover token VOs, site-tag stripping, group detection, schema-driven
|
||||||
|
annotation (movie, TV episode, season pack with optional source),
|
||||||
|
and field assembly.
|
||||||
|
|
||||||
|
- **Release parser v2 — enricher pass** completes the EASY pipeline.
|
||||||
|
The structural schema walk now tolerates non-positional tokens
|
||||||
|
between chunks (instead of aborting on leftover tokens), and a second
|
||||||
|
pass tags them with audio / video-meta / edition / language roles.
|
||||||
|
Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
|
||||||
|
(e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
|
||||||
|
matched before single tokens. Channel layouts like `5.1` and `7.1`
|
||||||
|
(split into two tokens by the `.` separator) are detected as
|
||||||
|
consecutive pairs. Sequence members carry an `extra["sequence_member"]`
|
||||||
|
marker so `assemble` extracts the canonical value only from the
|
||||||
|
primary token. KONTRAST releases with audio / HDR / edition / language
|
||||||
|
metadata now produce a fully populated `ParsedRelease`.
|
||||||
|
|
||||||
|
- **Streaming distributor as a separate dimension** from encoding source.
|
||||||
|
New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX,
|
||||||
|
ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors`
|
||||||
|
port field, a `TokenRole.DISTRIBUTOR` annotation, and a
|
||||||
|
`ParsedRelease.distributor` field. `WEB-DL` stays the source; the
|
||||||
|
platform that produced the release is now recorded distinctly. The
|
||||||
|
five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed
|
||||||
|
from `sources.yaml`.
|
||||||
|
|
||||||
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
|
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
|
||||||
each documenting an expected `ParsedRelease` plus the future `routing`
|
each documenting an expected `ParsedRelease` plus the future `routing`
|
||||||
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
|
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
|
||||||
@@ -54,6 +418,22 @@ callers).
|
|||||||
|
|
||||||
### Changed
|
### Changed
|
||||||
|
|
||||||
|
- **Release parser v2 — SHITTY simplified to dict-driven tagging**.
|
||||||
|
The legacy ~480-line heuristic block in `release/services.py` is gone;
|
||||||
|
`pipeline._annotate_shitty` does a single pass that looks each token
|
||||||
|
up in the kb buckets (resolutions / sources / codecs / distributors /
|
||||||
|
year / `SxxExx`) with first-match-wins semantics, and the leftmost
|
||||||
|
contiguous UNKNOWN run becomes the title. `annotate()` no longer
|
||||||
|
returns `None` — SHITTY is the always-on fallback when no group schema
|
||||||
|
matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures
|
||||||
|
(`deutschland_franchise_box`, `sleaford_yt_slug`,
|
||||||
|
`super_mario_bilingual`, `predator_space_separators` — the last one
|
||||||
|
moved from `shitty/` → `path_of_pain/`) are now marked
|
||||||
|
`pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies
|
||||||
|
that SHITTY intentionally won't handle. `ReleaseFixture` grows an
|
||||||
|
`xfail_reason` field; the parametrized suite wires the xfail mark
|
||||||
|
automatically.
|
||||||
|
|
||||||
- **`parse_release` tokenizer is now data-driven**: it splits on any character
|
- **`parse_release` tokenizer is now data-driven**: it splits on any character
|
||||||
listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
|
listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
|
||||||
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
|
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
|
||||||
|
|||||||
@@ -13,8 +13,6 @@ from alfred.application.filesystem import (
|
|||||||
MoveMediaUseCase,
|
MoveMediaUseCase,
|
||||||
SetFolderPathUseCase,
|
SetFolderPathUseCase,
|
||||||
)
|
)
|
||||||
from alfred.application.filesystem.detect_media_type import detect_media_type
|
|
||||||
from alfred.application.filesystem.enrich_from_probe import enrich_from_probe
|
|
||||||
from alfred.application.filesystem.resolve_destination import (
|
from alfred.application.filesystem.resolve_destination import (
|
||||||
resolve_episode_destination as _resolve_episode_destination,
|
resolve_episode_destination as _resolve_episode_destination,
|
||||||
)
|
)
|
||||||
@@ -28,10 +26,16 @@ from alfred.application.filesystem.resolve_destination import (
|
|||||||
resolve_series_destination as _resolve_series_destination,
|
resolve_series_destination as _resolve_series_destination,
|
||||||
)
|
)
|
||||||
from alfred.infrastructure.filesystem import FileManager, create_folder, move
|
from alfred.infrastructure.filesystem import FileManager, create_folder, move
|
||||||
from alfred.infrastructure.filesystem.ffprobe import probe
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
from alfred.infrastructure.filesystem.find_video import find_video_file
|
|
||||||
from alfred.infrastructure.metadata import MetadataStore
|
from alfred.infrastructure.metadata import MetadataStore
|
||||||
from alfred.infrastructure.persistence import get_memory
|
from alfred.infrastructure.persistence import get_memory
|
||||||
|
from alfred.infrastructure.probe import FfprobeMediaProber
|
||||||
|
|
||||||
|
# Agent-tools frontier: this is the legitimate home for the singletons that
|
||||||
|
# back every LLM-exposed wrapper. The use cases below take ``kb`` / ``prober``
|
||||||
|
# as required params; tests inject their own stubs.
|
||||||
|
_KB = YamlReleaseKnowledge()
|
||||||
|
_PROBER = FfprobeMediaProber()
|
||||||
|
|
||||||
_LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge"
|
_LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge"
|
||||||
|
|
||||||
@@ -57,10 +61,17 @@ def resolve_season_destination(
|
|||||||
tmdb_title: str,
|
tmdb_title: str,
|
||||||
tmdb_year: int,
|
tmdb_year: int,
|
||||||
confirmed_folder: str | None = None,
|
confirmed_folder: str | None = None,
|
||||||
|
source_path: str | None = None,
|
||||||
) -> dict[str, Any]:
|
) -> dict[str, Any]:
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_season_destination.yaml."""
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_season_destination.yaml."""
|
||||||
return _resolve_season_destination(
|
return _resolve_season_destination(
|
||||||
release_name, tmdb_title, tmdb_year, confirmed_folder
|
release_name,
|
||||||
|
tmdb_title,
|
||||||
|
tmdb_year,
|
||||||
|
_KB,
|
||||||
|
_PROBER,
|
||||||
|
confirmed_folder,
|
||||||
|
source_path,
|
||||||
).to_dict()
|
).to_dict()
|
||||||
|
|
||||||
|
|
||||||
@@ -78,6 +89,8 @@ def resolve_episode_destination(
|
|||||||
source_file,
|
source_file,
|
||||||
tmdb_title,
|
tmdb_title,
|
||||||
tmdb_year,
|
tmdb_year,
|
||||||
|
_KB,
|
||||||
|
_PROBER,
|
||||||
tmdb_episode_title,
|
tmdb_episode_title,
|
||||||
confirmed_folder,
|
confirmed_folder,
|
||||||
).to_dict()
|
).to_dict()
|
||||||
@@ -91,7 +104,7 @@ def resolve_movie_destination(
|
|||||||
) -> dict[str, Any]:
|
) -> dict[str, Any]:
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_movie_destination.yaml."""
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_movie_destination.yaml."""
|
||||||
return _resolve_movie_destination(
|
return _resolve_movie_destination(
|
||||||
release_name, source_file, tmdb_title, tmdb_year
|
release_name, source_file, tmdb_title, tmdb_year, _KB, _PROBER
|
||||||
).to_dict()
|
).to_dict()
|
||||||
|
|
||||||
|
|
||||||
@@ -100,10 +113,17 @@ def resolve_series_destination(
|
|||||||
tmdb_title: str,
|
tmdb_title: str,
|
||||||
tmdb_year: int,
|
tmdb_year: int,
|
||||||
confirmed_folder: str | None = None,
|
confirmed_folder: str | None = None,
|
||||||
|
source_path: str | None = None,
|
||||||
) -> dict[str, Any]:
|
) -> dict[str, Any]:
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_series_destination.yaml."""
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_series_destination.yaml."""
|
||||||
return _resolve_series_destination(
|
return _resolve_series_destination(
|
||||||
release_name, tmdb_title, tmdb_year, confirmed_folder
|
release_name,
|
||||||
|
tmdb_title,
|
||||||
|
tmdb_year,
|
||||||
|
_KB,
|
||||||
|
_PROBER,
|
||||||
|
confirmed_folder,
|
||||||
|
source_path,
|
||||||
).to_dict()
|
).to_dict()
|
||||||
|
|
||||||
|
|
||||||
@@ -190,22 +210,10 @@ def set_path_for_folder(folder_name: str, path_value: str) -> dict[str, Any]:
|
|||||||
|
|
||||||
def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
|
def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/analyze_release.yaml."""
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/analyze_release.yaml."""
|
||||||
from alfred.application.filesystem.resolve_destination import _KB # noqa: PLC0415
|
from alfred.application.release import inspect_release # noqa: PLC0415
|
||||||
from alfred.domain.release.services import parse_release # noqa: PLC0415
|
|
||||||
|
|
||||||
path = Path(source_path)
|
|
||||||
parsed = parse_release(release_name, _KB)
|
|
||||||
parsed.media_type = detect_media_type(parsed, path, _KB)
|
|
||||||
|
|
||||||
probe_used = False
|
|
||||||
if parsed.media_type not in ("unknown", "other"):
|
|
||||||
video_file = find_video_file(path, _KB)
|
|
||||||
if video_file:
|
|
||||||
media_info = probe(video_file)
|
|
||||||
if media_info:
|
|
||||||
enrich_from_probe(parsed, media_info)
|
|
||||||
probe_used = True
|
|
||||||
|
|
||||||
|
result = inspect_release(release_name, Path(source_path), _KB, _PROBER)
|
||||||
|
parsed = result.parsed
|
||||||
return {
|
return {
|
||||||
"status": "ok",
|
"status": "ok",
|
||||||
"media_type": parsed.media_type,
|
"media_type": parsed.media_type,
|
||||||
@@ -227,7 +235,10 @@ def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
|
|||||||
"edition": parsed.edition,
|
"edition": parsed.edition,
|
||||||
"site_tag": parsed.site_tag,
|
"site_tag": parsed.site_tag,
|
||||||
"is_season_pack": parsed.is_season_pack,
|
"is_season_pack": parsed.is_season_pack,
|
||||||
"probe_used": probe_used,
|
"probe_used": result.probe_used,
|
||||||
|
"confidence": result.report.confidence,
|
||||||
|
"road": result.report.road,
|
||||||
|
"recommended_action": result.recommended_action,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -241,7 +252,7 @@ def probe_media(source_path: str) -> dict[str, Any]:
|
|||||||
"message": f"{source_path} does not exist",
|
"message": f"{source_path} does not exist",
|
||||||
}
|
}
|
||||||
|
|
||||||
media_info = probe(path)
|
media_info = _PROBER.probe(path)
|
||||||
if media_info is None:
|
if media_info is None:
|
||||||
return {
|
return {
|
||||||
"status": "error",
|
"status": "error",
|
||||||
|
|||||||
@@ -80,3 +80,6 @@ returns:
|
|||||||
site_tag: Source-site tag if present.
|
site_tag: Source-site tag if present.
|
||||||
is_season_pack: True when the folder contains a full season.
|
is_season_pack: True when the folder contains a full season.
|
||||||
probe_used: True when ffprobe successfully enriched the result.
|
probe_used: True when ffprobe successfully enriched the result.
|
||||||
|
confidence: Parser confidence score, 0–100 (higher = more reliable).
|
||||||
|
road: "Parser road: 'easy' (group schema matched), 'shitty' (heuristic but acceptable), or 'path_of_pain' (low confidence — ask the user before auto-routing)."
|
||||||
|
recommended_action: "Orchestrator hint: 'process' (go straight to resolve_*_destination), 'ask_user' (media_type unknown or road=path_of_pain — confirm with the user first), or 'skip' (no main video, or media_type=other — nothing to organize)."
|
||||||
|
|||||||
@@ -61,6 +61,17 @@ parameters:
|
|||||||
one.
|
one.
|
||||||
example: Oz.1997.1080p.WEBRip.x265-KONTRAST
|
example: Oz.1997.1080p.WEBRip.x265-KONTRAST
|
||||||
|
|
||||||
|
source_path:
|
||||||
|
description: |
|
||||||
|
Absolute path to the release folder on disk. Optional.
|
||||||
|
why_needed: |
|
||||||
|
When provided, the tool runs ffprobe on the main video inside the
|
||||||
|
folder and uses the probe data to fill quality/codec tokens that
|
||||||
|
may be missing from the release name. The enriched tech tokens
|
||||||
|
end up in the destination folder name, so providing source_path
|
||||||
|
gives more accurate names for releases with sparse metadata.
|
||||||
|
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
|
||||||
|
|
||||||
returns:
|
returns:
|
||||||
ok:
|
ok:
|
||||||
description: Paths resolved unambiguously; ready to move.
|
description: Paths resolved unambiguously; ready to move.
|
||||||
|
|||||||
@@ -56,6 +56,16 @@ parameters:
|
|||||||
Forces the use case to use this exact folder name and skip detection.
|
Forces the use case to use this exact folder name and skip detection.
|
||||||
example: The.Wire.2002.1080p.BluRay.x265-GROUP
|
example: The.Wire.2002.1080p.BluRay.x265-GROUP
|
||||||
|
|
||||||
|
source_path:
|
||||||
|
description: |
|
||||||
|
Absolute path to the release folder on disk. Optional.
|
||||||
|
why_needed: |
|
||||||
|
When provided, the tool runs ffprobe on the main video inside the
|
||||||
|
folder and uses probe data to fill quality/codec tokens that may
|
||||||
|
be missing from the release name, producing a more accurate
|
||||||
|
destination folder name.
|
||||||
|
example: /downloads/The.Wire.S01-S05.1080p.BluRay.x265-GROUP
|
||||||
|
|
||||||
returns:
|
returns:
|
||||||
ok:
|
ok:
|
||||||
description: Path resolved; ready to move the pack.
|
description: Path resolved; ready to move the pack.
|
||||||
|
|||||||
@@ -1,82 +0,0 @@
|
|||||||
"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from alfred.domain.release.value_objects import ParsedRelease
|
|
||||||
from alfred.domain.shared.media import MediaInfo
|
|
||||||
|
|
||||||
# Map ffprobe codec names to scene-style codec tokens
|
|
||||||
_VIDEO_CODEC_MAP = {
|
|
||||||
"hevc": "x265",
|
|
||||||
"h264": "x264",
|
|
||||||
"h265": "x265",
|
|
||||||
"av1": "AV1",
|
|
||||||
"vp9": "VP9",
|
|
||||||
"mpeg4": "XviD",
|
|
||||||
}
|
|
||||||
|
|
||||||
# Map ffprobe audio codec names to scene-style tokens
|
|
||||||
_AUDIO_CODEC_MAP = {
|
|
||||||
"eac3": "EAC3",
|
|
||||||
"ac3": "AC3",
|
|
||||||
"dts": "DTS",
|
|
||||||
"truehd": "TrueHD",
|
|
||||||
"aac": "AAC",
|
|
||||||
"flac": "FLAC",
|
|
||||||
"opus": "OPUS",
|
|
||||||
"mp3": "MP3",
|
|
||||||
"pcm_s16l": "PCM",
|
|
||||||
"pcm_s24l": "PCM",
|
|
||||||
}
|
|
||||||
|
|
||||||
# Map channel count to standard layout string
|
|
||||||
_CHANNEL_MAP = {
|
|
||||||
8: "7.1",
|
|
||||||
6: "5.1",
|
|
||||||
2: "2.0",
|
|
||||||
1: "1.0",
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def enrich_from_probe(parsed: ParsedRelease, info: MediaInfo) -> None:
|
|
||||||
"""
|
|
||||||
Fill None fields in parsed using data from ffprobe MediaInfo.
|
|
||||||
|
|
||||||
Only overwrites fields that are currently None — token-level values
|
|
||||||
from the release name always take priority.
|
|
||||||
Mutates parsed in place.
|
|
||||||
"""
|
|
||||||
if parsed.quality is None and info.resolution:
|
|
||||||
parsed.quality = info.resolution
|
|
||||||
|
|
||||||
if parsed.codec is None and info.video_codec:
|
|
||||||
parsed.codec = _VIDEO_CODEC_MAP.get(
|
|
||||||
info.video_codec.lower(), info.video_codec.upper()
|
|
||||||
)
|
|
||||||
|
|
||||||
if parsed.bit_depth is None and info.video_codec:
|
|
||||||
# ffprobe exposes bit depth via pix_fmt — not in MediaInfo yet, skip for now
|
|
||||||
pass
|
|
||||||
|
|
||||||
# Audio — use the default track, fallback to first
|
|
||||||
default_track = next((t for t in info.audio_tracks if t.is_default), None)
|
|
||||||
track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
|
|
||||||
|
|
||||||
if track:
|
|
||||||
if parsed.audio_codec is None and track.codec:
|
|
||||||
parsed.audio_codec = _AUDIO_CODEC_MAP.get(
|
|
||||||
track.codec.lower(), track.codec.upper()
|
|
||||||
)
|
|
||||||
|
|
||||||
if parsed.audio_channels is None and track.channels:
|
|
||||||
parsed.audio_channels = _CHANNEL_MAP.get(
|
|
||||||
track.channels, f"{track.channels}ch"
|
|
||||||
)
|
|
||||||
|
|
||||||
# Languages — merge ffprobe languages with token-level ones
|
|
||||||
# "und" = undetermined, not useful
|
|
||||||
if info.audio_languages:
|
|
||||||
existing = set(parsed.languages)
|
|
||||||
for lang in info.audio_languages:
|
|
||||||
if lang.lower() != "und" and lang.upper() not in existing:
|
|
||||||
parsed.languages.append(lang)
|
|
||||||
@@ -4,7 +4,7 @@ import logging
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from alfred.domain.shared.value_objects import ImdbId
|
from alfred.domain.shared.value_objects import ImdbId
|
||||||
from alfred.domain.subtitles.entities import SubtitleCandidate
|
from alfred.domain.subtitles.entities import SubtitleScanResult
|
||||||
from alfred.domain.subtitles.services.identifier import SubtitleIdentifier
|
from alfred.domain.subtitles.services.identifier import SubtitleIdentifier
|
||||||
from alfred.domain.subtitles.services.matcher import SubtitleMatcher
|
from alfred.domain.subtitles.services.matcher import SubtitleMatcher
|
||||||
from alfred.domain.subtitles.services.pattern_detector import PatternDetector
|
from alfred.domain.subtitles.services.pattern_detector import PatternDetector
|
||||||
@@ -278,7 +278,7 @@ class ManageSubtitlesUseCase:
|
|||||||
|
|
||||||
|
|
||||||
def _to_unresolved_dto(
|
def _to_unresolved_dto(
|
||||||
track: SubtitleCandidate, min_confidence: float = 0.7
|
track: SubtitleScanResult, min_confidence: float = 0.7
|
||||||
) -> UnresolvedTrack:
|
) -> UnresolvedTrack:
|
||||||
reason = "unknown_language" if track.language is None else "low_confidence"
|
reason = "unknown_language" if track.language is None else "low_confidence"
|
||||||
return UnresolvedTrack(
|
return UnresolvedTrack(
|
||||||
@@ -291,10 +291,10 @@ def _to_unresolved_dto(
|
|||||||
|
|
||||||
def _pair_placed_with_tracks(
|
def _pair_placed_with_tracks(
|
||||||
placed: list[PlacedTrack],
|
placed: list[PlacedTrack],
|
||||||
tracks: list[SubtitleCandidate],
|
tracks: list[SubtitleScanResult],
|
||||||
) -> list[tuple[PlacedTrack, SubtitleCandidate]]:
|
) -> list[tuple[PlacedTrack, SubtitleScanResult]]:
|
||||||
"""
|
"""
|
||||||
Pair each PlacedTrack with its originating SubtitleCandidate by source path.
|
Pair each PlacedTrack with its originating SubtitleScanResult by source path.
|
||||||
Falls back to positional matching if paths don't align.
|
Falls back to positional matching if paths don't align.
|
||||||
"""
|
"""
|
||||||
track_by_path = {t.file_path: t for t in tracks if t.file_path}
|
track_by_path = {t.file_path: t for t in tracks if t.file_path}
|
||||||
|
|||||||
@@ -22,16 +22,35 @@ import logging
|
|||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.application.release import inspect_release
|
||||||
from alfred.domain.release import parse_release
|
from alfred.domain.release import parse_release
|
||||||
from alfred.domain.release.ports import ReleaseKnowledge
|
from alfred.domain.release.ports import ReleaseKnowledge
|
||||||
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
from alfred.domain.release.value_objects import ParsedRelease
|
||||||
|
from alfred.domain.shared.ports import MediaProber
|
||||||
from alfred.infrastructure.persistence import get_memory
|
from alfred.infrastructure.persistence import get_memory
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
# Single module-level knowledge instance. YAML is loaded once at first import.
|
|
||||||
# Tests that need a custom KB can monkeypatch this attribute.
|
def _resolve_parsed(
|
||||||
_KB: ReleaseKnowledge = YamlReleaseKnowledge()
|
release_name: str,
|
||||||
|
source_path: str | None,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
|
) -> ParsedRelease:
|
||||||
|
"""Pick the right entry point depending on whether we have a path.
|
||||||
|
|
||||||
|
When ``source_path`` is provided and points to something that exists,
|
||||||
|
we run the full inspection pipeline so probe data can refresh tech
|
||||||
|
fields (which feed every filename builder). Otherwise we fall back
|
||||||
|
to a parse-only path — same behavior as before.
|
||||||
|
"""
|
||||||
|
if source_path:
|
||||||
|
path = Path(source_path)
|
||||||
|
if path.exists():
|
||||||
|
return inspect_release(release_name, path, kb, prober).parsed
|
||||||
|
parsed, _ = parse_release(release_name, kb)
|
||||||
|
return parsed
|
||||||
|
|
||||||
|
|
||||||
def _find_existing_tvshow_folders(
|
def _find_existing_tvshow_folders(
|
||||||
@@ -236,13 +255,20 @@ def resolve_season_destination(
|
|||||||
release_name: str,
|
release_name: str,
|
||||||
tmdb_title: str,
|
tmdb_title: str,
|
||||||
tmdb_year: int,
|
tmdb_year: int,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
confirmed_folder: str | None = None,
|
confirmed_folder: str | None = None,
|
||||||
|
source_path: str | None = None,
|
||||||
) -> ResolvedSeasonDestination:
|
) -> ResolvedSeasonDestination:
|
||||||
"""
|
"""
|
||||||
Compute destination paths for a season pack.
|
Compute destination paths for a season pack.
|
||||||
|
|
||||||
Returns series_folder + season_folder. No file paths — the whole
|
Returns series_folder + season_folder. No file paths — the whole
|
||||||
source folder is moved as-is into season_folder.
|
source folder is moved as-is into season_folder.
|
||||||
|
|
||||||
|
When ``source_path`` points to the release on disk, the parser is
|
||||||
|
augmented with ffprobe data so tech tokens missing from the release
|
||||||
|
name (quality / codec) end up in the folder names.
|
||||||
"""
|
"""
|
||||||
tv_root = _get_tv_root()
|
tv_root = _get_tv_root()
|
||||||
if not tv_root:
|
if not tv_root:
|
||||||
@@ -252,8 +278,8 @@ def resolve_season_destination(
|
|||||||
message="TV show library path is not configured.",
|
message="TV show library path is not configured.",
|
||||||
)
|
)
|
||||||
|
|
||||||
parsed = parse_release(release_name, _KB)
|
parsed = _resolve_parsed(release_name, source_path, kb, prober)
|
||||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||||
|
|
||||||
resolved = _resolve_series_folder(
|
resolved = _resolve_series_folder(
|
||||||
@@ -286,6 +312,8 @@ def resolve_episode_destination(
|
|||||||
source_file: str,
|
source_file: str,
|
||||||
tmdb_title: str,
|
tmdb_title: str,
|
||||||
tmdb_year: int,
|
tmdb_year: int,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
tmdb_episode_title: str | None = None,
|
tmdb_episode_title: str | None = None,
|
||||||
confirmed_folder: str | None = None,
|
confirmed_folder: str | None = None,
|
||||||
) -> ResolvedEpisodeDestination:
|
) -> ResolvedEpisodeDestination:
|
||||||
@@ -293,6 +321,8 @@ def resolve_episode_destination(
|
|||||||
Compute destination paths for a single episode file.
|
Compute destination paths for a single episode file.
|
||||||
|
|
||||||
Returns series_folder + season_folder + library_file (full path to .mkv).
|
Returns series_folder + season_folder + library_file (full path to .mkv).
|
||||||
|
``source_file`` doubles as the inspection target — when it exists,
|
||||||
|
ffprobe enrichment refreshes tech tokens missing from the release name.
|
||||||
"""
|
"""
|
||||||
tv_root = _get_tv_root()
|
tv_root = _get_tv_root()
|
||||||
if not tv_root:
|
if not tv_root:
|
||||||
@@ -302,11 +332,11 @@ def resolve_episode_destination(
|
|||||||
message="TV show library path is not configured.",
|
message="TV show library path is not configured.",
|
||||||
)
|
)
|
||||||
|
|
||||||
parsed = parse_release(release_name, _KB)
|
parsed = _resolve_parsed(release_name, source_file, kb, prober)
|
||||||
ext = Path(source_file).suffix
|
ext = Path(source_file).suffix
|
||||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||||
tmdb_episode_title_safe = (
|
tmdb_episode_title_safe = (
|
||||||
_KB.sanitize_for_fs(tmdb_episode_title) if tmdb_episode_title else None
|
kb.sanitize_for_fs(tmdb_episode_title) if tmdb_episode_title else None
|
||||||
)
|
)
|
||||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||||
|
|
||||||
@@ -345,11 +375,15 @@ def resolve_movie_destination(
|
|||||||
source_file: str,
|
source_file: str,
|
||||||
tmdb_title: str,
|
tmdb_title: str,
|
||||||
tmdb_year: int,
|
tmdb_year: int,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
) -> ResolvedMovieDestination:
|
) -> ResolvedMovieDestination:
|
||||||
"""
|
"""
|
||||||
Compute destination paths for a movie file.
|
Compute destination paths for a movie file.
|
||||||
|
|
||||||
Returns movie_folder + library_file (full path to .mkv).
|
Returns movie_folder + library_file (full path to .mkv).
|
||||||
|
``source_file`` doubles as the inspection target — when it exists,
|
||||||
|
ffprobe enrichment refreshes tech tokens missing from the release name.
|
||||||
"""
|
"""
|
||||||
memory = get_memory()
|
memory = get_memory()
|
||||||
movies_root = memory.ltm.library_paths.get("movie")
|
movies_root = memory.ltm.library_paths.get("movie")
|
||||||
@@ -360,9 +394,9 @@ def resolve_movie_destination(
|
|||||||
message="Movie library path is not configured.",
|
message="Movie library path is not configured.",
|
||||||
)
|
)
|
||||||
|
|
||||||
parsed = parse_release(release_name, _KB)
|
parsed = _resolve_parsed(release_name, source_file, kb, prober)
|
||||||
ext = Path(source_file).suffix
|
ext = Path(source_file).suffix
|
||||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||||
|
|
||||||
folder_name = parsed.movie_folder_name(tmdb_title_safe, tmdb_year)
|
folder_name = parsed.movie_folder_name(tmdb_title_safe, tmdb_year)
|
||||||
filename = parsed.movie_filename(tmdb_title_safe, tmdb_year, ext)
|
filename = parsed.movie_filename(tmdb_title_safe, tmdb_year, ext)
|
||||||
@@ -384,12 +418,18 @@ def resolve_series_destination(
|
|||||||
release_name: str,
|
release_name: str,
|
||||||
tmdb_title: str,
|
tmdb_title: str,
|
||||||
tmdb_year: int,
|
tmdb_year: int,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
confirmed_folder: str | None = None,
|
confirmed_folder: str | None = None,
|
||||||
|
source_path: str | None = None,
|
||||||
) -> ResolvedSeriesDestination:
|
) -> ResolvedSeriesDestination:
|
||||||
"""
|
"""
|
||||||
Compute destination path for a complete multi-season series pack.
|
Compute destination path for a complete multi-season series pack.
|
||||||
|
|
||||||
Returns only series_folder — the whole pack lands directly inside it.
|
Returns only series_folder — the whole pack lands directly inside it.
|
||||||
|
|
||||||
|
When ``source_path`` points to the release on disk, ffprobe
|
||||||
|
enrichment refreshes tech tokens missing from the release name.
|
||||||
"""
|
"""
|
||||||
tv_root = _get_tv_root()
|
tv_root = _get_tv_root()
|
||||||
if not tv_root:
|
if not tv_root:
|
||||||
@@ -399,8 +439,8 @@ def resolve_series_destination(
|
|||||||
message="TV show library path is not configured.",
|
message="TV show library path is not configured.",
|
||||||
)
|
)
|
||||||
|
|
||||||
parsed = parse_release(release_name, _KB)
|
parsed = _resolve_parsed(release_name, source_path, kb, prober)
|
||||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||||
|
|
||||||
resolved = _resolve_series_folder(
|
resolved = _resolve_series_folder(
|
||||||
|
|||||||
@@ -0,0 +1,20 @@
|
|||||||
|
"""Release application layer — orchestrators sitting between domain
|
||||||
|
parsing and infrastructure I/O.
|
||||||
|
|
||||||
|
Public surface:
|
||||||
|
|
||||||
|
- :func:`is_supported_video` / :func:`find_main_video` — pre-pipeline
|
||||||
|
filesystem helpers (extension-only filtering, top-level video pick).
|
||||||
|
- :func:`inspect_release` / :class:`InspectedResult` — full inspection
|
||||||
|
pipeline combining parse + filesystem refinement + probe enrichment.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .inspect import InspectedResult, inspect_release
|
||||||
|
from .supported_media import find_main_video, is_supported_video
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"InspectedResult",
|
||||||
|
"find_main_video",
|
||||||
|
"inspect_release",
|
||||||
|
"is_supported_video",
|
||||||
|
]
|
||||||
@@ -0,0 +1,74 @@
|
|||||||
|
"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import replace
|
||||||
|
|
||||||
|
from alfred.domain.release.ports import ReleaseKnowledge
|
||||||
|
from alfred.domain.release.value_objects import ParsedRelease
|
||||||
|
from alfred.domain.shared.media import MediaInfo
|
||||||
|
|
||||||
|
|
||||||
|
def enrich_from_probe(
|
||||||
|
parsed: ParsedRelease, info: MediaInfo, kb: ReleaseKnowledge
|
||||||
|
) -> ParsedRelease:
|
||||||
|
"""
|
||||||
|
Return a new ParsedRelease with None fields filled from ffprobe MediaInfo.
|
||||||
|
|
||||||
|
Only overwrites fields that are currently None — token-level values
|
||||||
|
from the release name always take priority. ``ParsedRelease`` is
|
||||||
|
frozen; this returns a new instance via :func:`dataclasses.replace`.
|
||||||
|
|
||||||
|
Translation tables (ffprobe codec name → scene token, channel count
|
||||||
|
→ layout) live in ``kb.probe_mappings`` (loaded from
|
||||||
|
``alfred/knowledge/release/probe_mappings.yaml``). When ffprobe
|
||||||
|
reports a value with no mapping entry, the fallback is the uppercase
|
||||||
|
raw value so unknown codecs still surface in a predictable form.
|
||||||
|
"""
|
||||||
|
mappings = kb.probe_mappings
|
||||||
|
video_codec_map: dict[str, str] = mappings.get("video_codec", {})
|
||||||
|
audio_codec_map: dict[str, str] = mappings.get("audio_codec", {})
|
||||||
|
channel_map: dict[int, str] = mappings.get("audio_channels", {})
|
||||||
|
|
||||||
|
updates: dict[str, object] = {}
|
||||||
|
|
||||||
|
if parsed.quality is None and info.resolution:
|
||||||
|
updates["quality"] = info.resolution
|
||||||
|
|
||||||
|
if parsed.codec is None and info.video_codec:
|
||||||
|
updates["codec"] = video_codec_map.get(
|
||||||
|
info.video_codec.lower(), info.video_codec.upper()
|
||||||
|
)
|
||||||
|
|
||||||
|
# bit_depth: ffprobe exposes it via pix_fmt — not in MediaInfo yet, skip.
|
||||||
|
|
||||||
|
# Audio — use the default track, fallback to first
|
||||||
|
default_track = next((t for t in info.audio_tracks if t.is_default), None)
|
||||||
|
track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
|
||||||
|
|
||||||
|
if track:
|
||||||
|
if parsed.audio_codec is None and track.codec:
|
||||||
|
updates["audio_codec"] = audio_codec_map.get(
|
||||||
|
track.codec.lower(), track.codec.upper()
|
||||||
|
)
|
||||||
|
|
||||||
|
if parsed.audio_channels is None and track.channels:
|
||||||
|
updates["audio_channels"] = channel_map.get(
|
||||||
|
track.channels, f"{track.channels}ch"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Languages — merge ffprobe languages with token-level ones
|
||||||
|
# "und" = undetermined, not useful
|
||||||
|
if info.audio_languages:
|
||||||
|
existing_upper = {lang.upper() for lang in parsed.languages}
|
||||||
|
new_languages = list(parsed.languages)
|
||||||
|
for lang in info.audio_languages:
|
||||||
|
if lang.lower() != "und" and lang.upper() not in existing_upper:
|
||||||
|
new_languages.append(lang)
|
||||||
|
existing_upper.add(lang.upper())
|
||||||
|
if len(new_languages) != len(parsed.languages):
|
||||||
|
updates["languages"] = tuple(new_languages)
|
||||||
|
|
||||||
|
if not updates:
|
||||||
|
return parsed
|
||||||
|
return replace(parsed, **updates)
|
||||||
@@ -0,0 +1,193 @@
|
|||||||
|
"""Release inspection orchestrator — the canonical "look at this thing"
|
||||||
|
entry point.
|
||||||
|
|
||||||
|
``inspect_release`` is the single composition of the four layers we
|
||||||
|
care about for a freshly-arrived release:
|
||||||
|
|
||||||
|
1. **Parse the name** — :func:`alfred.domain.release.services.parse_release`
|
||||||
|
gives a ``ParsedRelease`` plus a ``ParseReport`` (confidence + road).
|
||||||
|
2. **Pick the main video** — :func:`find_main_video` runs a top-level
|
||||||
|
scan over the source path. If nothing qualifies the result still
|
||||||
|
completes; downstream callers decide what to do with a videoless
|
||||||
|
release.
|
||||||
|
3. **Refine the media type** — :func:`detect_media_type` uses the
|
||||||
|
on-disk extension mix to override any token-level guess (e.g. a
|
||||||
|
bare ``.iso`` folder becomes ``"other"``). The refined value is
|
||||||
|
patched onto ``parsed`` in place — same convention as
|
||||||
|
``analyze_release`` had before.
|
||||||
|
4. **Probe the video** — the injected :class:`MediaProber` fills in
|
||||||
|
missing technical fields via :func:`enrich_from_probe`. Skipped
|
||||||
|
when there is no main video or when ``media_type`` ended up in
|
||||||
|
``{"unknown", "other"}`` (the probe would tell us nothing useful).
|
||||||
|
|
||||||
|
The return type is :class:`InspectedResult`, a frozen VO that bundles
|
||||||
|
everything downstream callers need (``analyze_release`` tool,
|
||||||
|
``resolve_destination``, future workflow stages) without forcing them
|
||||||
|
to redo the same four calls.
|
||||||
|
|
||||||
|
Design notes:
|
||||||
|
|
||||||
|
- **Application layer.** This module touches both domain
|
||||||
|
(``parse_release``) and infrastructure (``MediaProber`` port). That
|
||||||
|
is exactly application's job — orchestrate.
|
||||||
|
- **Knowledge base is injected.** ``inspect_release`` takes ``kb`` and
|
||||||
|
``prober`` as parameters; no module-level singletons here. Callers
|
||||||
|
(the tool wrapper, tests) decide what to plug in.
|
||||||
|
- **Mutation is contained.** We still mutate ``parsed.media_type`` and
|
||||||
|
let ``enrich_from_probe`` fill its ``None`` fields, because
|
||||||
|
``ParsedRelease`` is intentionally a mutable dataclass. The outer
|
||||||
|
``InspectedResult`` is frozen so the *bundle* is immutable from the
|
||||||
|
caller's perspective.
|
||||||
|
- **Never raises.** Filesystem / probe errors surface as ``None``
|
||||||
|
fields on the result, never as exceptions — same contract as the
|
||||||
|
underlying adapters.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, replace
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.application.release.detect_media_type import detect_media_type
|
||||||
|
from alfred.application.release.enrich_from_probe import enrich_from_probe
|
||||||
|
from alfred.application.release.supported_media import find_main_video
|
||||||
|
from alfred.domain.release.ports import ReleaseKnowledge
|
||||||
|
from alfred.domain.release.services import parse_release
|
||||||
|
from alfred.domain.release.value_objects import (
|
||||||
|
MediaTypeToken,
|
||||||
|
ParsedRelease,
|
||||||
|
ParseReport,
|
||||||
|
)
|
||||||
|
from alfred.domain.shared.media import MediaInfo
|
||||||
|
from alfred.domain.shared.ports import MediaProber
|
||||||
|
|
||||||
|
|
||||||
|
# Media types for which a probe carries no useful information.
|
||||||
|
_NON_PROBABLE_MEDIA_TYPES = frozenset({"unknown", "other"})
|
||||||
|
|
||||||
|
# Media types for which there's nothing for the organizer to do.
|
||||||
|
# ``other`` covers things like games / ISOs / archives sitting on the
|
||||||
|
# downloads folder. ``unknown`` does NOT belong here — those need a
|
||||||
|
# user decision, not a skip.
|
||||||
|
_SKIPPABLE_MEDIA_TYPES = frozenset({"other"})
|
||||||
|
|
||||||
|
# Roads that signal the parser couldn't reach a confident answer on its
|
||||||
|
# own. ``Road`` values are kept as strings on the report to avoid a
|
||||||
|
# cross-package import here.
|
||||||
|
_ASK_USER_ROADS = frozenset({"path_of_pain"})
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class InspectedResult:
|
||||||
|
"""The full picture of a release: parsed name + filesystem reality.
|
||||||
|
|
||||||
|
Bundles everything the downstream pipeline needs after a single
|
||||||
|
inspection pass:
|
||||||
|
|
||||||
|
- ``parsed`` — :class:`ParsedRelease`, with ``media_type`` already
|
||||||
|
refined by :func:`detect_media_type` and ``None`` tech fields
|
||||||
|
filled in by :func:`enrich_from_probe` when a probe ran.
|
||||||
|
- ``report`` — :class:`ParseReport` from the parser (confidence +
|
||||||
|
road, untouched by inspection).
|
||||||
|
- ``source_path`` — the path the inspector was pointed at (file or
|
||||||
|
folder), as supplied by the caller.
|
||||||
|
- ``main_video`` — the canonical video file inside ``source_path``,
|
||||||
|
or ``None`` if no eligible file was found.
|
||||||
|
- ``media_info`` — the :class:`MediaInfo` snapshot when a probe
|
||||||
|
succeeded; ``None`` when no video was probed (no main video, or
|
||||||
|
``media_type`` in ``{"unknown", "other"}``) or when ffprobe
|
||||||
|
failed.
|
||||||
|
- ``probe_used`` — ``True`` iff ``media_info`` is non-``None`` and
|
||||||
|
``enrich_from_probe`` actually ran. Explicit flag so callers
|
||||||
|
don't have to re-derive the condition.
|
||||||
|
- ``recommended_action`` — derived hint for the orchestrator (see
|
||||||
|
property docstring). Encodes the exclusion / clarification /
|
||||||
|
go-ahead decision in one place so downstream callers don't
|
||||||
|
re-implement the same checks.
|
||||||
|
"""
|
||||||
|
|
||||||
|
parsed: ParsedRelease
|
||||||
|
report: ParseReport
|
||||||
|
source_path: Path
|
||||||
|
main_video: Path | None
|
||||||
|
media_info: MediaInfo | None
|
||||||
|
probe_used: bool
|
||||||
|
|
||||||
|
@property
|
||||||
|
def recommended_action(self) -> str:
|
||||||
|
"""Return one of ``"skip"`` / ``"ask_user"`` / ``"process"``.
|
||||||
|
|
||||||
|
- ``"skip"`` — nothing to organize:
|
||||||
|
* the source has no main video file, **or**
|
||||||
|
* ``media_type`` is ``"other"`` (games / ISOs / archives).
|
||||||
|
- ``"ask_user"`` — a decision is required before any action:
|
||||||
|
* ``media_type`` is ``"unknown"`` (parser couldn't classify), **or**
|
||||||
|
* the parse landed on ``Road.PATH_OF_PAIN``
|
||||||
|
(low-confidence, malformed name, etc.).
|
||||||
|
- ``"process"`` — everything else: a confident parse with a
|
||||||
|
usable media type and a main video on disk. The orchestrator
|
||||||
|
can move straight to the planning step.
|
||||||
|
|
||||||
|
The check ordering matters: ``"skip"`` wins over ``"ask_user"``
|
||||||
|
because if there's no video to organize, no question to the
|
||||||
|
user can change that. ``"ask_user"`` then wins over
|
||||||
|
``"process"`` because a confident parse alone isn't enough if
|
||||||
|
the type or road still flag uncertainty.
|
||||||
|
"""
|
||||||
|
if self.main_video is None:
|
||||||
|
return "skip"
|
||||||
|
if self.parsed.media_type.value in _SKIPPABLE_MEDIA_TYPES:
|
||||||
|
return "skip"
|
||||||
|
if self.parsed.media_type.value == "unknown":
|
||||||
|
return "ask_user"
|
||||||
|
if self.report.road in _ASK_USER_ROADS:
|
||||||
|
return "ask_user"
|
||||||
|
return "process"
|
||||||
|
|
||||||
|
|
||||||
|
def inspect_release(
|
||||||
|
release_name: str,
|
||||||
|
source_path: Path,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
|
) -> InspectedResult:
|
||||||
|
"""Run the full inspection pipeline on ``release_name`` /
|
||||||
|
``source_path``.
|
||||||
|
|
||||||
|
See module docstring for the four-step flow. ``kb`` and ``prober``
|
||||||
|
are injected so the caller controls the knowledge base layering
|
||||||
|
and the probe adapter (real ffprobe in production, stubs in tests).
|
||||||
|
|
||||||
|
Never raises. A missing or unreadable ``source_path`` simply
|
||||||
|
results in ``main_video=None`` and ``media_info=None``.
|
||||||
|
"""
|
||||||
|
parsed, report = parse_release(release_name, kb)
|
||||||
|
|
||||||
|
# Step 2: refine media_type from the on-disk extension mix.
|
||||||
|
# detect_media_type tolerates non-existent paths (returns parsed.media_type
|
||||||
|
# untouched), so no need to guard here. ParsedRelease is frozen — use
|
||||||
|
# dataclasses.replace to rebind with the refined value.
|
||||||
|
refined_media_type = MediaTypeToken(detect_media_type(parsed, source_path, kb))
|
||||||
|
if refined_media_type != parsed.media_type:
|
||||||
|
parsed = replace(parsed, media_type=refined_media_type)
|
||||||
|
|
||||||
|
# Step 3: pick the canonical main video (top-level scan only).
|
||||||
|
main_video = find_main_video(source_path, kb)
|
||||||
|
|
||||||
|
# Step 4: probe + enrich, when it makes sense.
|
||||||
|
media_info: MediaInfo | None = None
|
||||||
|
probe_used = False
|
||||||
|
if main_video is not None and parsed.media_type not in _NON_PROBABLE_MEDIA_TYPES:
|
||||||
|
media_info = prober.probe(main_video)
|
||||||
|
if media_info is not None:
|
||||||
|
parsed = enrich_from_probe(parsed, media_info, kb)
|
||||||
|
probe_used = True
|
||||||
|
|
||||||
|
return InspectedResult(
|
||||||
|
parsed=parsed,
|
||||||
|
report=report,
|
||||||
|
source_path=source_path,
|
||||||
|
main_video=main_video,
|
||||||
|
media_info=media_info,
|
||||||
|
probe_used=probe_used,
|
||||||
|
)
|
||||||
@@ -0,0 +1,74 @@
|
|||||||
|
"""Pre-pipeline exclusion — decide which files are worth parsing.
|
||||||
|
|
||||||
|
These helpers live one notch above the domain: they touch the
|
||||||
|
filesystem (``Path.iterdir``, ``Path.suffix``) but carry no parsing
|
||||||
|
logic of their own. The goal is to filter out non-video files and pick
|
||||||
|
the canonical "main video" from a release folder *before* anything
|
||||||
|
hits :func:`~alfred.domain.release.parse_release`.
|
||||||
|
|
||||||
|
Design notes (Phase A bis, 2026-05-20):
|
||||||
|
|
||||||
|
- **Extension is the sole eligibility criterion.** A file is supported
|
||||||
|
iff its suffix is in ``kb.video_extensions``. No size threshold, no
|
||||||
|
filename heuristics ("sample", "trailer", …). If a release packs a
|
||||||
|
bloated featurette or names its sample alphabetically before the
|
||||||
|
main feature, that's PATH_OF_PAIN territory — not this layer's job.
|
||||||
|
|
||||||
|
- **Top-level scan only.** ``find_main_video`` does not descend into
|
||||||
|
subdirectories. Releases that wrap the main video in ``Sample/`` or
|
||||||
|
similar are non-scene-standard and handled by the orchestrator
|
||||||
|
upstream.
|
||||||
|
|
||||||
|
- **Lexicographic tie-break.** When several candidates qualify
|
||||||
|
(legitimate for season packs), we return the first by alphabetical
|
||||||
|
order. Deterministic, no size-based ranking.
|
||||||
|
|
||||||
|
- **Direct ``Path`` I/O.** No ``FilesystemScanner`` port — this layer
|
||||||
|
is application, not domain. If isolation becomes necessary for
|
||||||
|
testing scale, we'll introduce a port then.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.domain.release.ports.knowledge import ReleaseKnowledge
|
||||||
|
|
||||||
|
|
||||||
|
def is_supported_video(path: Path, kb: ReleaseKnowledge) -> bool:
|
||||||
|
"""Return True when ``path`` is a video file the parser should
|
||||||
|
consider.
|
||||||
|
|
||||||
|
The check is purely extension-based: ``path.suffix.lower()`` must
|
||||||
|
belong to ``kb.video_extensions``. ``path`` must also be a regular
|
||||||
|
file — directories and broken symlinks return False.
|
||||||
|
"""
|
||||||
|
if not path.is_file():
|
||||||
|
return False
|
||||||
|
return path.suffix.lower() in kb.video_extensions
|
||||||
|
|
||||||
|
|
||||||
|
def find_main_video(folder: Path, kb: ReleaseKnowledge) -> Path | None:
|
||||||
|
"""Return the canonical main video file inside ``folder``, or
|
||||||
|
``None`` if there isn't one.
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
|
||||||
|
- Top-level scan only — subdirectories are ignored.
|
||||||
|
- Eligibility is :func:`is_supported_video`.
|
||||||
|
- When several files qualify, the lexicographically first one wins.
|
||||||
|
- When ``folder`` itself is a video file, it is returned as-is
|
||||||
|
(single-file releases are valid).
|
||||||
|
- When ``folder`` doesn't exist or isn't a directory (and isn't a
|
||||||
|
video file either), returns ``None``.
|
||||||
|
"""
|
||||||
|
if folder.is_file():
|
||||||
|
return folder if is_supported_video(folder, kb) else None
|
||||||
|
|
||||||
|
if not folder.is_dir():
|
||||||
|
return None
|
||||||
|
|
||||||
|
candidates = sorted(
|
||||||
|
child for child in folder.iterdir() if is_supported_video(child, kb)
|
||||||
|
)
|
||||||
|
return candidates[0] if candidates else None
|
||||||
@@ -5,13 +5,13 @@ import os
|
|||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from alfred.domain.subtitles.entities import SubtitleCandidate
|
from alfred.domain.subtitles.entities import SubtitleScanResult
|
||||||
from alfred.domain.subtitles.value_objects import SubtitleType
|
from alfred.domain.subtitles.value_objects import SubtitleType
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
def _build_dest_name(track: SubtitleCandidate, video_stem: str) -> str:
|
def _build_dest_name(track: SubtitleScanResult, video_stem: str) -> str:
|
||||||
"""
|
"""
|
||||||
Build the destination filename for a subtitle track.
|
Build the destination filename for a subtitle track.
|
||||||
|
|
||||||
@@ -41,7 +41,7 @@ class PlacedTrack:
|
|||||||
@dataclass
|
@dataclass
|
||||||
class PlaceResult:
|
class PlaceResult:
|
||||||
placed: list[PlacedTrack]
|
placed: list[PlacedTrack]
|
||||||
skipped: list[tuple[SubtitleCandidate, str]] # (track, reason)
|
skipped: list[tuple[SubtitleScanResult, str]] # (track, reason)
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def placed_count(self) -> int:
|
def placed_count(self) -> int:
|
||||||
@@ -54,7 +54,7 @@ class PlaceResult:
|
|||||||
|
|
||||||
class SubtitlePlacer:
|
class SubtitlePlacer:
|
||||||
"""
|
"""
|
||||||
Hard-links matched SubtitleCandidate files next to a destination video.
|
Hard-links matched SubtitleScanResult files next to a destination video.
|
||||||
|
|
||||||
Uses the same hard-link strategy as FileManager.copy_file:
|
Uses the same hard-link strategy as FileManager.copy_file:
|
||||||
instant, no data duplication, qBittorrent keeps seeding.
|
instant, no data duplication, qBittorrent keeps seeding.
|
||||||
@@ -64,11 +64,11 @@ class SubtitlePlacer:
|
|||||||
|
|
||||||
def place(
|
def place(
|
||||||
self,
|
self,
|
||||||
tracks: list[SubtitleCandidate],
|
tracks: list[SubtitleScanResult],
|
||||||
destination_video: Path,
|
destination_video: Path,
|
||||||
) -> PlaceResult:
|
) -> PlaceResult:
|
||||||
placed: list[PlacedTrack] = []
|
placed: list[PlacedTrack] = []
|
||||||
skipped: list[tuple[SubtitleCandidate, str]] = []
|
skipped: list[tuple[SubtitleScanResult, str]] = []
|
||||||
|
|
||||||
dest_dir = destination_video.parent
|
dest_dir = destination_video.parent
|
||||||
|
|
||||||
|
|||||||
@@ -8,19 +8,22 @@ from ..shared.value_objects import FilePath, FileSize, ImdbId
|
|||||||
from .value_objects import MovieTitle, Quality, ReleaseYear
|
from .value_objects import MovieTitle, Quality, ReleaseYear
|
||||||
|
|
||||||
|
|
||||||
@dataclass(eq=False)
|
@dataclass(frozen=True, eq=False)
|
||||||
class Movie(MediaWithTracks):
|
class Movie(MediaWithTracks):
|
||||||
"""
|
"""
|
||||||
Movie aggregate root for the movies domain.
|
Movie aggregate root for the movies domain.
|
||||||
|
|
||||||
Carries file metadata (path, size) and the tracks discovered by the
|
Carries file metadata (path, size) and the tracks discovered by the
|
||||||
ffprobe + subtitle scan pipeline. The track lists may be empty when the
|
ffprobe + subtitle scan pipeline. The track tuples may be empty when the
|
||||||
movie is known but not yet scanned, or when no file is downloaded.
|
movie is known but not yet scanned, or when no file is downloaded.
|
||||||
|
|
||||||
Track helpers follow the same "C+" contract as ``Episode``: pass a
|
Track helpers follow the same "C+" contract as ``Episode``: pass a
|
||||||
``Language`` for cross-format matching, or a ``str`` for case-insensitive
|
``Language`` for cross-format matching, or a ``str`` for case-insensitive
|
||||||
direct comparison.
|
direct comparison.
|
||||||
|
|
||||||
|
Frozen: rebuild via ``dataclasses.replace`` to project enrichment results
|
||||||
|
(audio/subtitle tracks, file metadata) onto a new instance.
|
||||||
|
|
||||||
Equality is identity-based: two ``Movie`` instances are equal iff they
|
Equality is identity-based: two ``Movie`` instances are equal iff they
|
||||||
share the same ``imdb_id``, regardless of file/track contents. This is
|
share the same ``imdb_id``, regardless of file/track contents. This is
|
||||||
the DDD aggregate invariant — the aggregate is identified by its root id.
|
the DDD aggregate invariant — the aggregate is identified by its root id.
|
||||||
@@ -34,15 +37,15 @@ class Movie(MediaWithTracks):
|
|||||||
file_size: FileSize | None = None
|
file_size: FileSize | None = None
|
||||||
tmdb_id: int | None = None
|
tmdb_id: int | None = None
|
||||||
added_at: datetime = field(default_factory=datetime.now)
|
added_at: datetime = field(default_factory=datetime.now)
|
||||||
audio_tracks: list[AudioTrack] = field(default_factory=list)
|
audio_tracks: tuple[AudioTrack, ...] = field(default_factory=tuple)
|
||||||
subtitle_tracks: list[SubtitleTrack] = field(default_factory=list)
|
subtitle_tracks: tuple[SubtitleTrack, ...] = field(default_factory=tuple)
|
||||||
|
|
||||||
def __post_init__(self):
|
def __post_init__(self):
|
||||||
"""Validate movie entity."""
|
"""Validate movie entity."""
|
||||||
# Ensure ImdbId is actually an ImdbId instance
|
# Ensure ImdbId is actually an ImdbId instance
|
||||||
if not isinstance(self.imdb_id, ImdbId):
|
if not isinstance(self.imdb_id, ImdbId):
|
||||||
if isinstance(self.imdb_id, str):
|
if isinstance(self.imdb_id, str):
|
||||||
self.imdb_id = ImdbId(self.imdb_id)
|
object.__setattr__(self, "imdb_id", ImdbId(self.imdb_id))
|
||||||
else:
|
else:
|
||||||
raise ValueError(
|
raise ValueError(
|
||||||
f"imdb_id must be ImdbId or str, got {type(self.imdb_id)}"
|
f"imdb_id must be ImdbId or str, got {type(self.imdb_id)}"
|
||||||
@@ -51,7 +54,7 @@ class Movie(MediaWithTracks):
|
|||||||
# Ensure MovieTitle is actually a MovieTitle instance
|
# Ensure MovieTitle is actually a MovieTitle instance
|
||||||
if not isinstance(self.title, MovieTitle):
|
if not isinstance(self.title, MovieTitle):
|
||||||
if isinstance(self.title, str):
|
if isinstance(self.title, str):
|
||||||
self.title = MovieTitle(self.title)
|
object.__setattr__(self, "title", MovieTitle(self.title))
|
||||||
else:
|
else:
|
||||||
raise ValueError(
|
raise ValueError(
|
||||||
f"title must be MovieTitle or str, got {type(self.title)}"
|
f"title must be MovieTitle or str, got {type(self.title)}"
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
"""Release domain — release name parsing and naming conventions."""
|
"""Release domain — release name parsing and naming conventions."""
|
||||||
|
|
||||||
from .services import parse_release
|
from .services import parse_release
|
||||||
from .value_objects import ParsedRelease
|
from .value_objects import ParsedRelease, ParseReport
|
||||||
|
|
||||||
__all__ = ["ParsedRelease", "parse_release"]
|
__all__ = ["ParsedRelease", "ParseReport", "parse_release"]
|
||||||
|
|||||||
@@ -0,0 +1,31 @@
|
|||||||
|
"""Release parser v2 — annotate-based pipeline.
|
||||||
|
|
||||||
|
This package is the future home of ``parse_release``. It restructures the
|
||||||
|
parsing logic around a **tokenize → annotate → assemble** pipeline:
|
||||||
|
|
||||||
|
1. **tokenize**: split the release name into atomic tokens.
|
||||||
|
2. **annotate**: walk tokens left-to-right, assigning each one a
|
||||||
|
:class:`TokenRole` (TITLE, YEAR, SEASON, RESOLUTION, …) using the
|
||||||
|
injected :class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge`.
|
||||||
|
3. **assemble**: fold the annotated tokens into a :class:`ParsedRelease`.
|
||||||
|
|
||||||
|
The pipeline has three internal paths driven by the detected release group:
|
||||||
|
|
||||||
|
- **EASY**: known group (KONTRAST, RARBG, …) with a schema-driven layout
|
||||||
|
declared in ``knowledge/release/release_groups/<group>.yaml``.
|
||||||
|
- **SHITTY**: unknown group, best-effort matching against the global
|
||||||
|
knowledge sets, with a 0-100 confidence score.
|
||||||
|
- **PATH OF PAIN**: score below threshold OR critical chunks missing —
|
||||||
|
signaled to the caller, who decides whether to involve the LLM/user.
|
||||||
|
|
||||||
|
Today the package exposes scaffolding only (token VOs and a thin pipeline
|
||||||
|
stub). The legacy ``parse_release`` in ``release.services`` keeps serving
|
||||||
|
production until each piece of the v2 pipeline is wired in.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from .schema import GroupSchema, SchemaChunk
|
||||||
|
from .tokens import Token, TokenRole
|
||||||
|
|
||||||
|
__all__ = ["GroupSchema", "SchemaChunk", "Token", "TokenRole"]
|
||||||
@@ -0,0 +1,763 @@
|
|||||||
|
"""Annotate-based pipeline.
|
||||||
|
|
||||||
|
Three stages:
|
||||||
|
|
||||||
|
1. :func:`tokenize` — release name → ``list[Token]`` (all UNKNOWN), plus
|
||||||
|
a separately-returned site tag (e.g. ``[YTS.MX]``) that is never
|
||||||
|
tokenized.
|
||||||
|
2. :func:`annotate` — promote each token's :class:`TokenRole` using the
|
||||||
|
injected knowledge base. Two sub-passes:
|
||||||
|
|
||||||
|
a. **Structural** (schema-driven, EASY only). Detects the group at
|
||||||
|
the right end, looks up its :class:`GroupSchema`, then matches
|
||||||
|
the schema's chunk sequence against the token stream. Between
|
||||||
|
two structural chunks, any number of unmatched tokens may
|
||||||
|
remain — they are left UNKNOWN for the enricher pass to handle.
|
||||||
|
b. **Enrichers** (non-positional). Walks UNKNOWN tokens and tags
|
||||||
|
audio / video-meta / edition / language roles. Multi-token
|
||||||
|
sequences (``DTS.HD.MA``, ``DV.HDR10``, ``DIRECTORS.CUT``) are
|
||||||
|
matched first, single tokens after.
|
||||||
|
|
||||||
|
3. :func:`assemble` — fold annotated tokens into a
|
||||||
|
:class:`~alfred.domain.release.value_objects.ParsedRelease`-compatible
|
||||||
|
dict.
|
||||||
|
|
||||||
|
The pipeline is **pure**: no I/O, no TMDB, no probe. All knowledge
|
||||||
|
arrives through ``kb: ReleaseKnowledge``.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..ports.knowledge import ReleaseKnowledge
|
||||||
|
from ..value_objects import MediaTypeToken
|
||||||
|
from .schema import GroupSchema
|
||||||
|
from .tokens import Token, TokenRole
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 1 — tokenize
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def strip_site_tag(name: str) -> tuple[str, str | None]:
|
||||||
|
"""Split off a ``[site.tag]`` prefix or suffix.
|
||||||
|
|
||||||
|
Returns ``(clean_name, tag)``. If no tag is found, returns
|
||||||
|
``(name.strip(), None)``.
|
||||||
|
"""
|
||||||
|
s = name.strip()
|
||||||
|
|
||||||
|
if s.startswith("["):
|
||||||
|
close = s.find("]")
|
||||||
|
if close != -1:
|
||||||
|
tag = s[1:close].strip()
|
||||||
|
remainder = s[close + 1 :].strip()
|
||||||
|
if tag and remainder:
|
||||||
|
return remainder, tag
|
||||||
|
|
||||||
|
if s.endswith("]"):
|
||||||
|
open_bracket = s.rfind("[")
|
||||||
|
if open_bracket != -1:
|
||||||
|
tag = s[open_bracket + 1 : -1].strip()
|
||||||
|
remainder = s[:open_bracket].strip()
|
||||||
|
if tag and remainder:
|
||||||
|
return remainder, tag
|
||||||
|
|
||||||
|
return s, None
|
||||||
|
|
||||||
|
|
||||||
|
def tokenize(name: str, kb: ReleaseKnowledge) -> tuple[list[Token], str | None]:
|
||||||
|
"""Split ``name`` into tokens after stripping any site tag.
|
||||||
|
|
||||||
|
String-ops style: replace every configured separator with a single
|
||||||
|
NUL byte then split. NUL cannot legally appear in a release name, so
|
||||||
|
it's a safe sentinel.
|
||||||
|
"""
|
||||||
|
clean, site_tag = strip_site_tag(name)
|
||||||
|
|
||||||
|
DELIM = "\x00"
|
||||||
|
buf = clean
|
||||||
|
for sep in kb.separators:
|
||||||
|
if sep != DELIM:
|
||||||
|
buf = buf.replace(sep, DELIM)
|
||||||
|
|
||||||
|
pieces = [p for p in buf.split(DELIM) if p]
|
||||||
|
tokens = [Token(text=p, index=i) for i, p in enumerate(pieces)]
|
||||||
|
return tokens, site_tag
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Helpers shared across passes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_season_episode(text: str) -> tuple[int, int | None, int | None] | None:
|
||||||
|
"""Parse a single token as ``SxxExx`` / ``SxxExxExx`` / ``Sxx`` /
|
||||||
|
``Sxx-yy`` (season range) / ``NxNN``.
|
||||||
|
|
||||||
|
Returns ``(season, episode, episode_end)`` or ``None`` if the token
|
||||||
|
is not a season/episode marker. For ``Sxx-yy``, returns the first
|
||||||
|
season with no episode info — the caller is expected to detect the
|
||||||
|
range form and promote ``media_type`` to ``tv_complete`` separately.
|
||||||
|
"""
|
||||||
|
upper = text.upper()
|
||||||
|
|
||||||
|
# SxxExx form (and Sxx, Sxx-yy)
|
||||||
|
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
|
||||||
|
season = int(upper[1:3])
|
||||||
|
rest = upper[3:]
|
||||||
|
|
||||||
|
if not rest:
|
||||||
|
return season, None, None
|
||||||
|
|
||||||
|
# Sxx-yy season-range form: capture the first season, treat as a
|
||||||
|
# complete-series marker (no episode info).
|
||||||
|
if (
|
||||||
|
len(rest) == 3
|
||||||
|
and rest[0] == "-"
|
||||||
|
and rest[1:3].isdigit()
|
||||||
|
):
|
||||||
|
return season, None, None
|
||||||
|
|
||||||
|
episodes: list[int] = []
|
||||||
|
while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
|
||||||
|
episodes.append(int(rest[1:3]))
|
||||||
|
rest = rest[3:]
|
||||||
|
|
||||||
|
if not episodes:
|
||||||
|
return None
|
||||||
|
# For chained multi-episode markers (E09E10E11), the range is the
|
||||||
|
# first → last episode. Intermediate values are implied.
|
||||||
|
return season, episodes[0], episodes[-1] if len(episodes) >= 2 else None
|
||||||
|
|
||||||
|
# NxNN form
|
||||||
|
if "X" in upper:
|
||||||
|
parts = upper.split("X")
|
||||||
|
if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
|
||||||
|
season = int(parts[0])
|
||||||
|
episode = int(parts[1])
|
||||||
|
episode_end = int(parts[2]) if len(parts) >= 3 else None
|
||||||
|
return season, episode, episode_end
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _is_year(text: str) -> bool:
|
||||||
|
"""Return True if ``text`` is a 4-digit year in [1900, 2099]."""
|
||||||
|
return len(text) == 4 and text.isdigit() and 1900 <= int(text) <= 2099
|
||||||
|
|
||||||
|
|
||||||
|
def _split_codec_group(text: str, kb: ReleaseKnowledge) -> tuple[str, str] | None:
|
||||||
|
"""Split a ``codec-GROUP`` token into ``(codec, group)`` if it fits.
|
||||||
|
|
||||||
|
Returns ``None`` if the token doesn't match the ``codec-GROUP``
|
||||||
|
shape. Handles the empty-group case (``x265-``) as ``(codec, "")``.
|
||||||
|
"""
|
||||||
|
if "-" not in text:
|
||||||
|
return None
|
||||||
|
head, _, tail = text.rpartition("-")
|
||||||
|
if head.lower() in kb.codecs:
|
||||||
|
return head, tail
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _match_role(text: str, role: TokenRole, kb: ReleaseKnowledge) -> TokenRole | None:
|
||||||
|
"""Return ``role`` if ``text`` matches it under ``kb``, else ``None``."""
|
||||||
|
lower = text.lower()
|
||||||
|
|
||||||
|
if role is TokenRole.YEAR:
|
||||||
|
return TokenRole.YEAR if _is_year(text) else None
|
||||||
|
|
||||||
|
if role is TokenRole.SEASON_EPISODE:
|
||||||
|
return (
|
||||||
|
TokenRole.SEASON_EPISODE
|
||||||
|
if _parse_season_episode(text) is not None
|
||||||
|
else None
|
||||||
|
)
|
||||||
|
|
||||||
|
if role is TokenRole.RESOLUTION:
|
||||||
|
return TokenRole.RESOLUTION if lower in kb.resolutions else None
|
||||||
|
|
||||||
|
if role is TokenRole.SOURCE:
|
||||||
|
return TokenRole.SOURCE if lower in kb.sources else None
|
||||||
|
|
||||||
|
if role is TokenRole.CODEC:
|
||||||
|
return TokenRole.CODEC if lower in kb.codecs else None
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 2a — group detection
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _detect_group(tokens: list[Token], kb: ReleaseKnowledge) -> tuple[str, int | None]:
|
||||||
|
"""Identify the release group by walking tokens right-to-left.
|
||||||
|
|
||||||
|
Returns ``(group_name, token_index_carrying_group)``. ``index`` is
|
||||||
|
``None`` when the group is absent (no trailing ``-`` in the stream).
|
||||||
|
"""
|
||||||
|
# Priority 1: codec-GROUP shape (clearest signal).
|
||||||
|
for tok in reversed(tokens):
|
||||||
|
split = _split_codec_group(tok.text, kb)
|
||||||
|
if split is not None:
|
||||||
|
_, group = split
|
||||||
|
return (group or "UNKNOWN"), tok.index
|
||||||
|
|
||||||
|
# Priority 2: rightmost dash, excluding dashed sources (Web-DL, etc.).
|
||||||
|
for tok in reversed(tokens):
|
||||||
|
if "-" not in tok.text:
|
||||||
|
continue
|
||||||
|
head, _, tail = tok.text.rpartition("-")
|
||||||
|
if (
|
||||||
|
head.lower() in kb.sources
|
||||||
|
or tok.text.lower().replace("-", "") in kb.sources
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
if tail:
|
||||||
|
return tail, tok.index
|
||||||
|
|
||||||
|
return "UNKNOWN", None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 2b — structural annotation (schema-driven)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _annotate_structural(
|
||||||
|
tokens: list[Token],
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
schema: GroupSchema,
|
||||||
|
group_token_index: int,
|
||||||
|
) -> list[Token] | None:
|
||||||
|
"""Annotate structural tokens following a known group schema.
|
||||||
|
|
||||||
|
Walks the schema's chunks against the body (tokens up to the group
|
||||||
|
token). For each chunk, scans forward in the body for a matching
|
||||||
|
token — tokens passed over without match are left UNKNOWN (the
|
||||||
|
enricher pass will handle them).
|
||||||
|
|
||||||
|
Returns ``None`` if any mandatory chunk fails to find a match.
|
||||||
|
"""
|
||||||
|
result = list(tokens)
|
||||||
|
|
||||||
|
# The codec-GROUP token carries CODEC + GROUP. Split it now so the
|
||||||
|
# schema walk knows the codec is "pre-consumed" at the end.
|
||||||
|
group_token = result[group_token_index]
|
||||||
|
cg_split = _split_codec_group(group_token.text, kb)
|
||||||
|
codec_pre_consumed = False
|
||||||
|
if cg_split is not None:
|
||||||
|
codec, group = cg_split
|
||||||
|
result[group_token_index] = group_token.with_role(
|
||||||
|
TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
|
||||||
|
)
|
||||||
|
codec_pre_consumed = True
|
||||||
|
else:
|
||||||
|
head, _, tail = group_token.text.rpartition("-")
|
||||||
|
result[group_token_index] = group_token.with_role(
|
||||||
|
TokenRole.GROUP, group=tail or "UNKNOWN", prefix=head
|
||||||
|
)
|
||||||
|
|
||||||
|
body_end = group_token_index # exclusive
|
||||||
|
tok_idx = 0
|
||||||
|
chunk_idx = 0
|
||||||
|
|
||||||
|
# 1) TITLE — leftmost contiguous tokens up to the first structural
|
||||||
|
# boundary. Title is special because it can be multi-token.
|
||||||
|
while (
|
||||||
|
chunk_idx < len(schema.chunks)
|
||||||
|
and schema.chunks[chunk_idx].role is TokenRole.TITLE
|
||||||
|
):
|
||||||
|
title_end = _find_title_end(result, body_end, kb)
|
||||||
|
for i in range(tok_idx, title_end):
|
||||||
|
result[i] = result[i].with_role(TokenRole.TITLE)
|
||||||
|
tok_idx = title_end
|
||||||
|
chunk_idx += 1
|
||||||
|
|
||||||
|
# 2) Remaining structural chunks. For each, scan forward in the body
|
||||||
|
# for a matching token; tokens passed over remain UNKNOWN.
|
||||||
|
for chunk in schema.chunks[chunk_idx:]:
|
||||||
|
if chunk.role is TokenRole.GROUP:
|
||||||
|
continue
|
||||||
|
if chunk.role is TokenRole.CODEC and codec_pre_consumed:
|
||||||
|
continue
|
||||||
|
|
||||||
|
match_idx = _find_chunk(result, tok_idx, body_end, chunk.role, kb)
|
||||||
|
if match_idx is None:
|
||||||
|
if chunk.optional:
|
||||||
|
continue
|
||||||
|
return None
|
||||||
|
|
||||||
|
result[match_idx] = result[match_idx].with_role(chunk.role)
|
||||||
|
tok_idx = match_idx + 1
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _find_title_end(
|
||||||
|
tokens: list[Token], body_end: int, kb: ReleaseKnowledge
|
||||||
|
) -> int:
|
||||||
|
"""Return the exclusive index where the title ends.
|
||||||
|
|
||||||
|
The title is the leftmost run of tokens whose text does not match
|
||||||
|
any structural role (year, season/episode, resolution, source,
|
||||||
|
codec). Enricher tokens (audio, HDR, language) are *not* boundaries
|
||||||
|
because they can appear in the middle of the structural sequence;
|
||||||
|
however, in canonical scene names they don't appear inside the title
|
||||||
|
itself, so this heuristic holds in practice.
|
||||||
|
"""
|
||||||
|
for i in range(body_end):
|
||||||
|
text = tokens[i].text
|
||||||
|
if _parse_season_episode(text) is not None:
|
||||||
|
return i
|
||||||
|
if _is_year(text):
|
||||||
|
return i
|
||||||
|
lower = text.lower()
|
||||||
|
if lower in kb.resolutions:
|
||||||
|
return i
|
||||||
|
if lower in kb.sources:
|
||||||
|
return i
|
||||||
|
if lower in kb.codecs:
|
||||||
|
return i
|
||||||
|
# codec-GROUP token (e.g. "x265-KONTRAST") or dashed source (Web-DL).
|
||||||
|
if "-" in text:
|
||||||
|
head, _, _ = text.rpartition("-")
|
||||||
|
if (
|
||||||
|
head.lower() in kb.codecs
|
||||||
|
or head.lower() in kb.sources
|
||||||
|
or text.lower().replace("-", "") in kb.sources
|
||||||
|
):
|
||||||
|
return i
|
||||||
|
return body_end
|
||||||
|
|
||||||
|
|
||||||
|
def _find_chunk(
|
||||||
|
tokens: list[Token],
|
||||||
|
start: int,
|
||||||
|
end: int,
|
||||||
|
role: TokenRole,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
) -> int | None:
|
||||||
|
"""Return the first index in ``[start, end)`` whose token matches ``role``.
|
||||||
|
|
||||||
|
Returns ``None`` if no token in the range matches. Tokens already
|
||||||
|
annotated (non-UNKNOWN) are skipped — they belong to another chunk.
|
||||||
|
"""
|
||||||
|
for i in range(start, end):
|
||||||
|
if tokens[i].role is not TokenRole.UNKNOWN:
|
||||||
|
continue
|
||||||
|
if _match_role(tokens[i].text, role, kb) is not None:
|
||||||
|
return i
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 2b' — SHITTY annotation (schema-less heuristic)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _annotate_shitty(
|
||||||
|
tokens: list[Token],
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
group_index: int | None,
|
||||||
|
) -> list[Token]:
|
||||||
|
"""Schema-less, dictionary-driven annotation.
|
||||||
|
|
||||||
|
SHITTY's job is narrow: for releases that *look* like scene names
|
||||||
|
but don't have a registered group schema, tag every token whose text
|
||||||
|
falls into a known YAML bucket (resolutions, codecs, sources, …).
|
||||||
|
Anything we can't classify stays UNKNOWN. The leftmost run of
|
||||||
|
UNKNOWN tokens becomes the title. Done.
|
||||||
|
|
||||||
|
Anything that requires more reasoning (parenthesized tech blocks,
|
||||||
|
bare-dashed title fragments, year-disguised slug suffixes, …) is
|
||||||
|
PATH OF PAIN territory and stays out of here on purpose.
|
||||||
|
"""
|
||||||
|
result = list(tokens)
|
||||||
|
|
||||||
|
# 1) Group token — split codec-GROUP or tag GROUP. Same logic as EASY.
|
||||||
|
if group_index is not None:
|
||||||
|
gt = result[group_index]
|
||||||
|
cg_split = _split_codec_group(gt.text, kb)
|
||||||
|
if cg_split is not None:
|
||||||
|
codec, group = cg_split
|
||||||
|
result[group_index] = gt.with_role(
|
||||||
|
TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
_, _, tail = gt.text.rpartition("-")
|
||||||
|
result[group_index] = gt.with_role(
|
||||||
|
TokenRole.GROUP, group=tail or "UNKNOWN"
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2) Enrichers (audio / video-meta / edition / language).
|
||||||
|
result = _annotate_enrichers(result, kb)
|
||||||
|
|
||||||
|
# 3) Single pass: tag each UNKNOWN token by looking it up in the kb
|
||||||
|
# buckets. First match wins per token, first occurrence wins per
|
||||||
|
# role (we don't overwrite an already-tagged role).
|
||||||
|
matchers: list[tuple[TokenRole, callable]] = [
|
||||||
|
(TokenRole.SEASON_EPISODE, lambda t: _parse_season_episode(t) is not None),
|
||||||
|
(TokenRole.YEAR, _is_year),
|
||||||
|
(TokenRole.RESOLUTION, lambda t: t.lower() in kb.resolutions),
|
||||||
|
(TokenRole.DISTRIBUTOR, lambda t: t.upper() in kb.distributors),
|
||||||
|
(TokenRole.SOURCE, lambda t: t.lower() in kb.sources),
|
||||||
|
(TokenRole.CODEC, lambda t: t.lower() in kb.codecs),
|
||||||
|
]
|
||||||
|
seen: set[TokenRole] = set()
|
||||||
|
|
||||||
|
for i, tok in enumerate(result):
|
||||||
|
if tok.role is not TokenRole.UNKNOWN:
|
||||||
|
continue
|
||||||
|
for role, matches in matchers:
|
||||||
|
if role in seen:
|
||||||
|
continue
|
||||||
|
if matches(tok.text):
|
||||||
|
result[i] = tok.with_role(role)
|
||||||
|
seen.add(role)
|
||||||
|
break
|
||||||
|
|
||||||
|
# 4) Title = leftmost contiguous UNKNOWN tokens.
|
||||||
|
for i, tok in enumerate(result):
|
||||||
|
if tok.role is not TokenRole.UNKNOWN:
|
||||||
|
break
|
||||||
|
result[i] = tok.with_role(TokenRole.TITLE)
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 2c — enricher pass (non-positional roles)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _annotate_enrichers(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
|
||||||
|
"""Tag the remaining UNKNOWN tokens with non-positional roles.
|
||||||
|
|
||||||
|
Multi-token sequences are matched first (so ``DTS.HD.MA`` wins over
|
||||||
|
a single-token ``DTS``). For each sequence match, the first token
|
||||||
|
receives the role + ``extra["sequence"]`` (the canonical joined
|
||||||
|
value), and the trailing members are marked with the same role +
|
||||||
|
``extra["sequence_member"]=True`` so :func:`assemble` extracts the
|
||||||
|
value only from the primary.
|
||||||
|
"""
|
||||||
|
result = list(tokens)
|
||||||
|
|
||||||
|
# Multi-token sequences first.
|
||||||
|
_apply_sequences(
|
||||||
|
result, kb.audio.get("sequences", []), "codec", TokenRole.AUDIO_CODEC
|
||||||
|
)
|
||||||
|
_apply_sequences(
|
||||||
|
result, kb.video_meta.get("sequences", []), "hdr", TokenRole.HDR
|
||||||
|
)
|
||||||
|
_apply_sequences(
|
||||||
|
result, kb.editions.get("sequences", []), "edition", TokenRole.EDITION
|
||||||
|
)
|
||||||
|
|
||||||
|
# Single tokens.
|
||||||
|
known_audio_codecs = {c.upper() for c in kb.audio.get("codecs", [])}
|
||||||
|
known_audio_channels = set(kb.audio.get("channels", []))
|
||||||
|
known_hdr = {h.upper() for h in kb.video_meta.get("hdr", [])} | kb.hdr_extra
|
||||||
|
known_bit_depth = {d.lower() for d in kb.video_meta.get("bit_depth", [])}
|
||||||
|
known_editions = {t.upper() for t in kb.editions.get("tokens", [])}
|
||||||
|
|
||||||
|
# Channel layouts like "5.1" are tokenized as two tokens ("5", "1")
|
||||||
|
# because "." is a separator. Detect consecutive pairs whose joined
|
||||||
|
# value (without any trailing "-GROUP") is in the channel set.
|
||||||
|
_detect_channel_pairs(result, known_audio_channels)
|
||||||
|
|
||||||
|
for i, tok in enumerate(result):
|
||||||
|
if tok.role is not TokenRole.UNKNOWN:
|
||||||
|
continue
|
||||||
|
text = tok.text
|
||||||
|
upper = text.upper()
|
||||||
|
lower = text.lower()
|
||||||
|
|
||||||
|
if upper in known_audio_codecs:
|
||||||
|
result[i] = tok.with_role(TokenRole.AUDIO_CODEC)
|
||||||
|
continue
|
||||||
|
if text in known_audio_channels:
|
||||||
|
result[i] = tok.with_role(TokenRole.AUDIO_CHANNELS)
|
||||||
|
continue
|
||||||
|
if upper in known_hdr:
|
||||||
|
result[i] = tok.with_role(TokenRole.HDR)
|
||||||
|
continue
|
||||||
|
if lower in known_bit_depth:
|
||||||
|
result[i] = tok.with_role(TokenRole.BIT_DEPTH)
|
||||||
|
continue
|
||||||
|
if upper in known_editions:
|
||||||
|
result[i] = tok.with_role(TokenRole.EDITION)
|
||||||
|
continue
|
||||||
|
if upper in kb.language_tokens:
|
||||||
|
result[i] = tok.with_role(TokenRole.LANGUAGE)
|
||||||
|
continue
|
||||||
|
if upper in kb.distributors:
|
||||||
|
result[i] = tok.with_role(TokenRole.DISTRIBUTOR)
|
||||||
|
continue
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _apply_sequences(
|
||||||
|
tokens: list[Token],
|
||||||
|
sequences: list[dict],
|
||||||
|
value_key: str,
|
||||||
|
role: TokenRole,
|
||||||
|
) -> None:
|
||||||
|
"""Mark the first occurrence of each sequence in place.
|
||||||
|
|
||||||
|
Mutates ``tokens`` (replacing entries with new role-tagged Token
|
||||||
|
instances). Sequences in the YAML must be ordered most-specific
|
||||||
|
first; the first match wins per starting position.
|
||||||
|
"""
|
||||||
|
if not sequences:
|
||||||
|
return
|
||||||
|
|
||||||
|
upper_texts = [t.text.upper() for t in tokens]
|
||||||
|
consumed: set[int] = set()
|
||||||
|
|
||||||
|
for seq in sequences:
|
||||||
|
seq_upper = [s.upper() for s in seq["tokens"]]
|
||||||
|
n = len(seq_upper)
|
||||||
|
for start in range(len(tokens) - n + 1):
|
||||||
|
if any(idx in consumed for idx in range(start, start + n)):
|
||||||
|
continue
|
||||||
|
if any(
|
||||||
|
tokens[start + k].role is not TokenRole.UNKNOWN for k in range(n)
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
if upper_texts[start : start + n] == seq_upper:
|
||||||
|
tokens[start] = tokens[start].with_role(
|
||||||
|
role, sequence=seq[value_key]
|
||||||
|
)
|
||||||
|
for k in range(1, n):
|
||||||
|
tokens[start + k] = tokens[start + k].with_role(
|
||||||
|
role, sequence_member="True"
|
||||||
|
)
|
||||||
|
consumed.update(range(start, start + n))
|
||||||
|
|
||||||
|
|
||||||
|
def _detect_channel_pairs(
|
||||||
|
tokens: list[Token], known_channels: set[str]
|
||||||
|
) -> None:
|
||||||
|
"""Spot two consecutive numeric tokens that form a channel layout.
|
||||||
|
|
||||||
|
Example: ``["5", "1-KTH"]`` → joined ``"5.1"`` (after stripping the
|
||||||
|
``-GROUP`` suffix on the second). The second token may be the trailing
|
||||||
|
codec-GROUP token, in which case it's already tagged CODEC and we
|
||||||
|
skip — we'd corrupt its role.
|
||||||
|
"""
|
||||||
|
for i in range(len(tokens) - 1):
|
||||||
|
first = tokens[i]
|
||||||
|
second = tokens[i + 1]
|
||||||
|
if first.role is not TokenRole.UNKNOWN:
|
||||||
|
continue
|
||||||
|
# Strip a "-GROUP" suffix on the second token before joining.
|
||||||
|
second_text = second.text.split("-")[0]
|
||||||
|
candidate = f"{first.text}.{second_text}"
|
||||||
|
if candidate not in known_channels:
|
||||||
|
continue
|
||||||
|
# Only tag the first token (carries the channel value). The
|
||||||
|
# second token may legitimately remain UNKNOWN (or be the
|
||||||
|
# codec-GROUP token, already tagged CODEC).
|
||||||
|
tokens[i] = first.with_role(
|
||||||
|
TokenRole.AUDIO_CHANNELS, sequence=candidate
|
||||||
|
)
|
||||||
|
if second.role is TokenRole.UNKNOWN:
|
||||||
|
tokens[i + 1] = second.with_role(
|
||||||
|
TokenRole.AUDIO_CHANNELS, sequence_member="True"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 2 entry point
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def annotate(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
|
||||||
|
"""Annotate token roles.
|
||||||
|
|
||||||
|
Dispatch:
|
||||||
|
|
||||||
|
* If a group is detected AND has a known schema, run the EASY
|
||||||
|
structural walk. If the schema walk aborts on a mandatory chunk
|
||||||
|
mismatch, fall through to SHITTY (the heuristic still does better
|
||||||
|
than giving up).
|
||||||
|
* Otherwise run SHITTY — schema-less, best-effort, never aborts.
|
||||||
|
|
||||||
|
The enricher pass runs in both cases. The pipeline always returns a
|
||||||
|
populated token list; downstream callers don't need to distinguish
|
||||||
|
EASY vs SHITTY at this layer (the parse_path is decided in the
|
||||||
|
service based on whether a schema matched).
|
||||||
|
"""
|
||||||
|
group_name, group_index = _detect_group(tokens, kb)
|
||||||
|
|
||||||
|
schema = kb.group_schema(group_name) if group_index is not None else None
|
||||||
|
if schema is not None and group_index is not None:
|
||||||
|
structural = _annotate_structural(tokens, kb, schema, group_index)
|
||||||
|
if structural is not None:
|
||||||
|
return _annotate_enrichers(structural, kb)
|
||||||
|
|
||||||
|
# SHITTY fallback — heuristic positional pass. ``_annotate_shitty``
|
||||||
|
# runs its own enricher pass internally (it has to, so the title
|
||||||
|
# scan can skip enricher-tagged tokens).
|
||||||
|
return _annotate_shitty(tokens, kb, group_index)
|
||||||
|
|
||||||
|
|
||||||
|
def has_known_schema(tokens: list[Token], kb: ReleaseKnowledge) -> bool:
|
||||||
|
"""Return True if ``tokens`` would take the EASY path in :func:`annotate`."""
|
||||||
|
group_name, group_index = _detect_group(tokens, kb)
|
||||||
|
if group_index is None:
|
||||||
|
return False
|
||||||
|
return kb.group_schema(group_name) is not None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 3 — assemble
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def assemble(
|
||||||
|
annotated: list[Token],
|
||||||
|
site_tag: str | None,
|
||||||
|
raw_name: str,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
) -> dict:
|
||||||
|
"""Fold annotated tokens into a ``ParsedRelease``-compatible dict.
|
||||||
|
|
||||||
|
Returns a dict (not a ``ParsedRelease`` instance) so the caller can
|
||||||
|
layer in additional fields (``parse_path``, ``raw``, …) before
|
||||||
|
instantiation.
|
||||||
|
"""
|
||||||
|
# Pure-punctuation tokens (e.g. a stray "-" left by ` - ` separators in
|
||||||
|
# human-friendly release names) carry no title content and would leak
|
||||||
|
# into the joined title as ``"Show.-.Episode"``. Drop them here.
|
||||||
|
title_parts = [
|
||||||
|
t.text
|
||||||
|
for t in annotated
|
||||||
|
if t.role is TokenRole.TITLE and any(c.isalnum() for c in t.text)
|
||||||
|
]
|
||||||
|
title = ".".join(title_parts) if title_parts else (
|
||||||
|
annotated[0].text if annotated else raw_name
|
||||||
|
)
|
||||||
|
|
||||||
|
year: int | None = None
|
||||||
|
season: int | None = None
|
||||||
|
episode: int | None = None
|
||||||
|
episode_end: int | None = None
|
||||||
|
quality: str | None = None
|
||||||
|
source: str | None = None
|
||||||
|
codec: str | None = None
|
||||||
|
group = "UNKNOWN"
|
||||||
|
audio_codec: str | None = None
|
||||||
|
audio_channels: str | None = None
|
||||||
|
bit_depth: str | None = None
|
||||||
|
hdr_format: str | None = None
|
||||||
|
edition: str | None = None
|
||||||
|
distributor: str | None = None
|
||||||
|
languages: list[str] = []
|
||||||
|
is_season_range = False
|
||||||
|
|
||||||
|
for tok in annotated:
|
||||||
|
# Skip non-primary members of a multi-token sequence.
|
||||||
|
if tok.extra.get("sequence_member") == "True":
|
||||||
|
continue
|
||||||
|
|
||||||
|
role = tok.role
|
||||||
|
if role is TokenRole.YEAR:
|
||||||
|
year = int(tok.text)
|
||||||
|
elif role is TokenRole.SEASON_EPISODE:
|
||||||
|
parsed = _parse_season_episode(tok.text)
|
||||||
|
if parsed is not None:
|
||||||
|
season, episode, episode_end = parsed
|
||||||
|
# Detect Sxx-yy range form to flag it as a multi-season pack.
|
||||||
|
upper = tok.text.upper()
|
||||||
|
if (
|
||||||
|
len(upper) == 6
|
||||||
|
and upper[0] == "S"
|
||||||
|
and upper[1:3].isdigit()
|
||||||
|
and upper[3] == "-"
|
||||||
|
and upper[4:6].isdigit()
|
||||||
|
):
|
||||||
|
is_season_range = True
|
||||||
|
elif role is TokenRole.RESOLUTION:
|
||||||
|
quality = tok.text
|
||||||
|
elif role is TokenRole.SOURCE:
|
||||||
|
source = tok.text
|
||||||
|
elif role is TokenRole.CODEC:
|
||||||
|
codec = tok.extra.get("codec", tok.text)
|
||||||
|
if "group" in tok.extra:
|
||||||
|
group = tok.extra["group"] or "UNKNOWN"
|
||||||
|
elif role is TokenRole.GROUP:
|
||||||
|
group = tok.extra.get("group", tok.text) or "UNKNOWN"
|
||||||
|
elif role is TokenRole.AUDIO_CODEC:
|
||||||
|
if audio_codec is None:
|
||||||
|
audio_codec = tok.extra.get("sequence", tok.text)
|
||||||
|
elif role is TokenRole.AUDIO_CHANNELS:
|
||||||
|
if audio_channels is None:
|
||||||
|
audio_channels = tok.extra.get("sequence", tok.text)
|
||||||
|
elif role is TokenRole.BIT_DEPTH:
|
||||||
|
if bit_depth is None:
|
||||||
|
bit_depth = tok.text.lower()
|
||||||
|
elif role is TokenRole.HDR:
|
||||||
|
if hdr_format is None:
|
||||||
|
hdr_format = tok.extra.get("sequence", tok.text.upper())
|
||||||
|
elif role is TokenRole.EDITION:
|
||||||
|
if edition is None:
|
||||||
|
edition = tok.extra.get("sequence", tok.text.upper())
|
||||||
|
elif role is TokenRole.LANGUAGE:
|
||||||
|
languages.append(tok.text.upper())
|
||||||
|
elif role is TokenRole.DISTRIBUTOR:
|
||||||
|
if distributor is None:
|
||||||
|
distributor = tok.text.upper()
|
||||||
|
|
||||||
|
# Media type heuristic. Doc/concert/integrale tokens win over the
|
||||||
|
# generic tech-based fallback. We look across all tokens (not just
|
||||||
|
# annotated ones) because these markers may be tagged UNKNOWN by the
|
||||||
|
# structural pass — only the assemble step cares about them.
|
||||||
|
upper_tokens = {tok.text.upper() for tok in annotated}
|
||||||
|
doc_tokens = {t.upper() for t in kb.media_type_tokens.get("doc", [])}
|
||||||
|
concert_tokens = {t.upper() for t in kb.media_type_tokens.get("concert", [])}
|
||||||
|
integrale_tokens = {t.upper() for t in kb.media_type_tokens.get("integrale", [])}
|
||||||
|
|
||||||
|
if upper_tokens & doc_tokens:
|
||||||
|
media_type = MediaTypeToken.DOCUMENTARY
|
||||||
|
elif upper_tokens & concert_tokens:
|
||||||
|
media_type = MediaTypeToken.CONCERT
|
||||||
|
elif is_season_range:
|
||||||
|
media_type = MediaTypeToken.TV_COMPLETE
|
||||||
|
elif (
|
||||||
|
edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
|
||||||
|
or upper_tokens & integrale_tokens
|
||||||
|
) and season is None:
|
||||||
|
media_type = MediaTypeToken.TV_COMPLETE
|
||||||
|
elif season is not None:
|
||||||
|
media_type = MediaTypeToken.TV_SHOW
|
||||||
|
elif any((quality, source, codec, year)):
|
||||||
|
media_type = MediaTypeToken.MOVIE
|
||||||
|
else:
|
||||||
|
media_type = MediaTypeToken.UNKNOWN
|
||||||
|
|
||||||
|
return {
|
||||||
|
"title": title,
|
||||||
|
"title_sanitized": kb.sanitize_for_fs(title),
|
||||||
|
"year": year,
|
||||||
|
"season": season,
|
||||||
|
"episode": episode,
|
||||||
|
"episode_end": episode_end,
|
||||||
|
"quality": quality,
|
||||||
|
"source": source,
|
||||||
|
"codec": codec,
|
||||||
|
"group": group,
|
||||||
|
"media_type": media_type,
|
||||||
|
"site_tag": site_tag,
|
||||||
|
"languages": tuple(languages),
|
||||||
|
"audio_codec": audio_codec,
|
||||||
|
"audio_channels": audio_channels,
|
||||||
|
"bit_depth": bit_depth,
|
||||||
|
"hdr_format": hdr_format,
|
||||||
|
"edition": edition,
|
||||||
|
"distributor": distributor,
|
||||||
|
}
|
||||||
@@ -0,0 +1,47 @@
|
|||||||
|
"""Group schema value objects.
|
||||||
|
|
||||||
|
A :class:`GroupSchema` describes the canonical chunk layout of releases
|
||||||
|
from a known group (KONTRAST, RARBG, ELiTE, …). It is the EASY-road
|
||||||
|
contract: when a release ends in ``-<GROUP>`` and we know the group,
|
||||||
|
the annotator walks the schema instead of running the heuristic SHITTY
|
||||||
|
matchers.
|
||||||
|
|
||||||
|
Schemas are loaded from ``knowledge/release/release_groups/<group>.yaml``
|
||||||
|
by an infrastructure adapter and surfaced via the
|
||||||
|
:class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge` port.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
from .tokens import TokenRole
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class SchemaChunk:
|
||||||
|
"""One entry in a group's chunk order.
|
||||||
|
|
||||||
|
``role`` is the :class:`TokenRole` the chunk maps to. ``optional``
|
||||||
|
is True for chunks that may be absent (e.g. ``year`` on TV releases,
|
||||||
|
``source`` on bare ELiTE TV releases).
|
||||||
|
"""
|
||||||
|
|
||||||
|
role: TokenRole
|
||||||
|
optional: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class GroupSchema:
|
||||||
|
"""Schema for a known release group.
|
||||||
|
|
||||||
|
``chunks`` is the left-to-right canonical order. The annotator walks
|
||||||
|
tokens and chunks in lockstep: an optional chunk that doesn't match
|
||||||
|
the current token is skipped (the chunk index advances, the token
|
||||||
|
index stays), a mandatory chunk that doesn't match aborts the EASY
|
||||||
|
path and falls back to SHITTY.
|
||||||
|
"""
|
||||||
|
|
||||||
|
name: str
|
||||||
|
separator: str
|
||||||
|
chunks: tuple[SchemaChunk, ...]
|
||||||
@@ -0,0 +1,139 @@
|
|||||||
|
"""Parse-confidence scoring.
|
||||||
|
|
||||||
|
``parse_release`` returns a :class:`ParseReport` alongside its
|
||||||
|
:class:`ParsedRelease`. The report carries:
|
||||||
|
|
||||||
|
- ``confidence``: integer 0–100 derived from which structural and
|
||||||
|
technical fields got populated, minus a penalty per UNKNOWN token
|
||||||
|
left in the annotated stream.
|
||||||
|
- ``road``: which of the three roads the parse took
|
||||||
|
(:class:`Road.EASY` / :class:`Road.SHITTY` / :class:`Road.PATH_OF_PAIN`).
|
||||||
|
- ``unknown_tokens``: textual residue, useful for diagnostics.
|
||||||
|
- ``missing_critical``: structural fields the score-tally found absent
|
||||||
|
(e.g. ``("year", "media_type")``) — the caller can use this to drive
|
||||||
|
PoP recovery (questions, LLM call).
|
||||||
|
|
||||||
|
All weights, penalties and thresholds come from the injected knowledge
|
||||||
|
base (``kb.scoring``), itself loaded from
|
||||||
|
``alfred/knowledge/release/scoring.yaml``. No magic numbers here.
|
||||||
|
|
||||||
|
The scoring functions are pure — they consume the annotated token list
|
||||||
|
and the resulting :class:`ParsedRelease` and return the report. They are
|
||||||
|
called by ``services.parse_release`` after ``assemble`` has run.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from enum import Enum
|
||||||
|
|
||||||
|
from ..ports.knowledge import ReleaseKnowledge
|
||||||
|
from ..value_objects import ParsedRelease
|
||||||
|
from .tokens import Token, TokenRole
|
||||||
|
|
||||||
|
|
||||||
|
class Road(str, Enum):
|
||||||
|
"""How the parser handled a given release name.
|
||||||
|
|
||||||
|
Distinct from :class:`~alfred.domain.release.value_objects.TokenizationRoute`,
|
||||||
|
which records the tokenization route (DIRECT / SANITIZED / AI). Road
|
||||||
|
is about confidence in the *result*, not the *method*.
|
||||||
|
"""
|
||||||
|
|
||||||
|
EASY = "easy" # group schema matched — structural annotation
|
||||||
|
SHITTY = "shitty" # no schema, dict-driven annotation, score ≥ threshold
|
||||||
|
PATH_OF_PAIN = "path_of_pain" # score below threshold, needs help
|
||||||
|
|
||||||
|
|
||||||
|
# Critical structural fields — their absence drives the
|
||||||
|
# ``missing_critical`` list in the report.
|
||||||
|
_CRITICAL_FIELDS: tuple[str, ...] = ("title", "media_type", "year")
|
||||||
|
|
||||||
|
|
||||||
|
def _is_tv_shaped(parsed: ParsedRelease) -> bool:
|
||||||
|
"""Season/episode weights only count for releases that *look* like TV."""
|
||||||
|
return parsed.season is not None
|
||||||
|
|
||||||
|
|
||||||
|
def compute_score(
|
||||||
|
parsed: ParsedRelease,
|
||||||
|
annotated: list[Token],
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
) -> int:
|
||||||
|
"""Compute a 0–100 confidence score for the parse.
|
||||||
|
|
||||||
|
Each populated field contributes its weight from
|
||||||
|
``kb.scoring["weights"]``. Season/episode only count when the parse
|
||||||
|
looks like TV. ``group == "UNKNOWN"`` is treated as absent.
|
||||||
|
|
||||||
|
Then a penalty is subtracted per residual UNKNOWN token in
|
||||||
|
``annotated``, capped at ``penalties["max_unknown_penalty"]``.
|
||||||
|
|
||||||
|
Result is clamped to ``[0, 100]``.
|
||||||
|
"""
|
||||||
|
weights = kb.scoring["weights"]
|
||||||
|
penalties = kb.scoring["penalties"]
|
||||||
|
|
||||||
|
score = 0
|
||||||
|
if parsed.title:
|
||||||
|
score += weights.get("title", 0)
|
||||||
|
if parsed.media_type and parsed.media_type.value != "unknown":
|
||||||
|
score += weights.get("media_type", 0)
|
||||||
|
if parsed.year is not None:
|
||||||
|
score += weights.get("year", 0)
|
||||||
|
if _is_tv_shaped(parsed):
|
||||||
|
if parsed.season is not None:
|
||||||
|
score += weights.get("season", 0)
|
||||||
|
if parsed.episode is not None:
|
||||||
|
score += weights.get("episode", 0)
|
||||||
|
if parsed.quality:
|
||||||
|
score += weights.get("resolution", 0)
|
||||||
|
if parsed.source:
|
||||||
|
score += weights.get("source", 0)
|
||||||
|
if parsed.codec:
|
||||||
|
score += weights.get("codec", 0)
|
||||||
|
if parsed.group and parsed.group != "UNKNOWN":
|
||||||
|
score += weights.get("group", 0)
|
||||||
|
|
||||||
|
unknown_count = sum(1 for t in annotated if t.role is TokenRole.UNKNOWN)
|
||||||
|
raw_penalty = unknown_count * penalties.get("unknown_token", 0)
|
||||||
|
capped_penalty = min(raw_penalty, penalties.get("max_unknown_penalty", 0))
|
||||||
|
score -= capped_penalty
|
||||||
|
|
||||||
|
return max(0, min(100, score))
|
||||||
|
|
||||||
|
|
||||||
|
def collect_unknown_tokens(annotated: list[Token]) -> tuple[str, ...]:
|
||||||
|
"""Return the text of every token still tagged UNKNOWN."""
|
||||||
|
return tuple(t.text for t in annotated if t.role is TokenRole.UNKNOWN)
|
||||||
|
|
||||||
|
|
||||||
|
def collect_missing_critical(parsed: ParsedRelease) -> tuple[str, ...]:
|
||||||
|
"""Return the names of critical structural fields that are absent."""
|
||||||
|
missing: list[str] = []
|
||||||
|
if not parsed.title:
|
||||||
|
missing.append("title")
|
||||||
|
if not parsed.media_type or parsed.media_type.value == "unknown":
|
||||||
|
missing.append("media_type")
|
||||||
|
if parsed.year is None:
|
||||||
|
missing.append("year")
|
||||||
|
return tuple(missing)
|
||||||
|
|
||||||
|
|
||||||
|
def decide_road(
|
||||||
|
score: int,
|
||||||
|
has_schema: bool,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
) -> Road:
|
||||||
|
"""Pick the road the parse took.
|
||||||
|
|
||||||
|
EASY is decided structurally: if a known group schema matched, the
|
||||||
|
annotation walked the schema, and that's enough — the score does not
|
||||||
|
veto EASY. Otherwise the score decides between SHITTY and
|
||||||
|
PATH_OF_PAIN using ``kb.scoring["thresholds"]["shitty_min"]``.
|
||||||
|
"""
|
||||||
|
if has_schema:
|
||||||
|
return Road.EASY
|
||||||
|
threshold = kb.scoring["thresholds"].get("shitty_min", 60)
|
||||||
|
if score >= threshold:
|
||||||
|
return Road.SHITTY
|
||||||
|
return Road.PATH_OF_PAIN
|
||||||
@@ -0,0 +1,90 @@
|
|||||||
|
"""Token value objects for the annotate-based parser.
|
||||||
|
|
||||||
|
A :class:`Token` carries both the original substring and its position in
|
||||||
|
the original release name's token stream. A :class:`TokenRole` is the
|
||||||
|
semantic tag assigned by the annotator.
|
||||||
|
|
||||||
|
Why VOs instead of bare ``str``: the annotate step needs to flag tokens
|
||||||
|
without consuming them (a token may carry residual info — e.g. a
|
||||||
|
``codec-GROUP`` token contributes both a CODEC and a GROUP role). Tracking
|
||||||
|
the index also lets later stages reason about *order* (year must come
|
||||||
|
after title, group must be rightmost, etc.) without re-scanning the list.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from enum import Enum
|
||||||
|
|
||||||
|
|
||||||
|
class TokenRole(str, Enum):
|
||||||
|
"""Semantic role a token can take after annotation.
|
||||||
|
|
||||||
|
A token starts as ``UNKNOWN`` and may be promoted by the annotator.
|
||||||
|
``str``-backed for cheap comparisons and YAML/JSON interop.
|
||||||
|
|
||||||
|
Roles split into three families:
|
||||||
|
|
||||||
|
- **structural**: TITLE / YEAR / SEASON_EPISODE / GROUP — drive folder
|
||||||
|
and filename naming.
|
||||||
|
- **technical**: RESOLUTION / SOURCE / CODEC / AUDIO_CODEC /
|
||||||
|
AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION / LANGUAGE — feed
|
||||||
|
``tech_string`` and metadata fields.
|
||||||
|
- **meta**: SITE_TAG (stripped pre-tokenize), SEPARATOR (kept for the
|
||||||
|
assemble step if a release uses spaces that need preservation in the
|
||||||
|
title), UNKNOWN (residual, contributes to the SHITTY score penalty).
|
||||||
|
"""
|
||||||
|
|
||||||
|
UNKNOWN = "unknown"
|
||||||
|
|
||||||
|
# Structural
|
||||||
|
TITLE = "title"
|
||||||
|
YEAR = "year"
|
||||||
|
SEASON_EPISODE = "season_episode"
|
||||||
|
GROUP = "group"
|
||||||
|
|
||||||
|
# Technical
|
||||||
|
RESOLUTION = "resolution"
|
||||||
|
SOURCE = "source"
|
||||||
|
CODEC = "codec"
|
||||||
|
AUDIO_CODEC = "audio_codec"
|
||||||
|
AUDIO_CHANNELS = "audio_channels"
|
||||||
|
BIT_DEPTH = "bit_depth"
|
||||||
|
HDR = "hdr"
|
||||||
|
EDITION = "edition"
|
||||||
|
LANGUAGE = "language"
|
||||||
|
DISTRIBUTOR = "distributor"
|
||||||
|
|
||||||
|
# Meta
|
||||||
|
SITE_TAG = "site_tag"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class Token:
|
||||||
|
"""An atomic token from a release name.
|
||||||
|
|
||||||
|
``text`` is the substring exactly as it appeared after tokenization
|
||||||
|
(case preserved — uppercase comparisons happen at match time).
|
||||||
|
``index`` is the 0-based position in the tokenized stream, used by
|
||||||
|
downstream stages to enforce ordering invariants.
|
||||||
|
|
||||||
|
``role`` defaults to :attr:`TokenRole.UNKNOWN`. The annotator returns
|
||||||
|
new :class:`Token` instances with the role set rather than mutating
|
||||||
|
(the dataclass is frozen). ``extra`` carries role-specific payload
|
||||||
|
when the token text alone isn't enough (e.g. a ``codec-GROUP`` token
|
||||||
|
annotated as CODEC may record the group name in ``extra["group"]``).
|
||||||
|
"""
|
||||||
|
|
||||||
|
text: str
|
||||||
|
index: int
|
||||||
|
role: TokenRole = TokenRole.UNKNOWN
|
||||||
|
extra: dict[str, str] = field(default_factory=dict)
|
||||||
|
|
||||||
|
def with_role(self, role: TokenRole, **extra: str) -> Token:
|
||||||
|
"""Return a copy of this token with ``role`` (and optional ``extra``)."""
|
||||||
|
merged = {**self.extra, **extra} if extra else self.extra
|
||||||
|
return Token(text=self.text, index=self.index, role=role, extra=merged)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def is_annotated(self) -> bool:
|
||||||
|
return self.role is not TokenRole.UNKNOWN
|
||||||
@@ -10,7 +10,10 @@ object that satisfies this shape (e.g. a simple dataclass).
|
|||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
from typing import Protocol
|
from typing import TYPE_CHECKING, Protocol
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from ..parser.schema import GroupSchema
|
||||||
|
|
||||||
|
|
||||||
class ReleaseKnowledge(Protocol):
|
class ReleaseKnowledge(Protocol):
|
||||||
@@ -21,6 +24,7 @@ class ReleaseKnowledge(Protocol):
|
|||||||
resolutions: set[str]
|
resolutions: set[str]
|
||||||
sources: set[str]
|
sources: set[str]
|
||||||
codecs: set[str]
|
codecs: set[str]
|
||||||
|
distributors: set[str]
|
||||||
language_tokens: set[str]
|
language_tokens: set[str]
|
||||||
forbidden_chars: set[str]
|
forbidden_chars: set[str]
|
||||||
hdr_extra: set[str]
|
hdr_extra: set[str]
|
||||||
@@ -36,6 +40,30 @@ class ReleaseKnowledge(Protocol):
|
|||||||
|
|
||||||
separators: list[str]
|
separators: list[str]
|
||||||
|
|
||||||
|
# --- Parse scoring (Phase A) ---
|
||||||
|
#
|
||||||
|
# ``scoring`` is a dict with three keys:
|
||||||
|
# - ``weights``: dict[field_name, int] field weight contribution
|
||||||
|
# - ``penalties``: {"unknown_token": int, "max_unknown_penalty": int}
|
||||||
|
# - ``thresholds``: {"shitty_min": int} SHITTY vs PATH_OF_PAIN cutoff
|
||||||
|
#
|
||||||
|
# Concrete values come from ``alfred/knowledge/release/scoring.yaml``.
|
||||||
|
# The loader fills in safe defaults so this dict is always populated.
|
||||||
|
|
||||||
|
scoring: dict
|
||||||
|
|
||||||
|
# --- ffprobe → scene-token translation tables (consumed by
|
||||||
|
# ``application.release.enrich_from_probe``). Domain parsing itself
|
||||||
|
# doesn't touch these — exposed on the same KB to keep release
|
||||||
|
# knowledge in a single ownership point.
|
||||||
|
#
|
||||||
|
# Shape:
|
||||||
|
# - ``video_codec``: dict[str, str] ffprobe lower → scene token
|
||||||
|
# - ``audio_codec``: dict[str, str] ffprobe lower → scene token
|
||||||
|
# - ``audio_channels``: dict[int, str] channel count → layout ---
|
||||||
|
|
||||||
|
probe_mappings: dict
|
||||||
|
|
||||||
# --- File-extension sets (used by application/infra modules that work
|
# --- File-extension sets (used by application/infra modules that work
|
||||||
# directly with filesystem paths, e.g. media-type detection, video
|
# directly with filesystem paths, e.g. media-type detection, video
|
||||||
# lookup). Domain parsing itself doesn't touch these. ---
|
# lookup). Domain parsing itself doesn't touch these. ---
|
||||||
@@ -50,3 +78,14 @@ class ReleaseKnowledge(Protocol):
|
|||||||
def sanitize_for_fs(self, text: str) -> str:
|
def sanitize_for_fs(self, text: str) -> str:
|
||||||
"""Strip filesystem-forbidden characters from ``text``."""
|
"""Strip filesystem-forbidden characters from ``text``."""
|
||||||
...
|
...
|
||||||
|
|
||||||
|
# --- Release group schemas (EASY path) ---
|
||||||
|
|
||||||
|
def group_schema(self, name: str) -> GroupSchema | None:
|
||||||
|
"""Return the parsing schema for the named release group, or
|
||||||
|
``None`` if the group is unknown (caller falls back to SHITTY).
|
||||||
|
|
||||||
|
Lookup is case-insensitive: ``"KONTRAST"``, ``"kontrast"`` and
|
||||||
|
``"Kontrast"`` all resolve to the same schema.
|
||||||
|
"""
|
||||||
|
...
|
||||||
|
|||||||
@@ -1,43 +1,68 @@
|
|||||||
"""Release domain — parsing service."""
|
"""Release domain — parsing service.
|
||||||
|
|
||||||
|
Thin orchestrator over the annotate-based pipeline in
|
||||||
|
:mod:`alfred.domain.release.parser.pipeline`. Responsibilities:
|
||||||
|
|
||||||
|
* Strip a leading/trailing ``[site.tag]`` and decide ``parse_path``.
|
||||||
|
* Reject malformed names (forbidden characters) → ``parse_path=AI`` so
|
||||||
|
the LLM can clean them up.
|
||||||
|
* Otherwise call the v2 pipeline (tokenize → annotate → assemble) and
|
||||||
|
wrap the result in :class:`ParsedRelease`.
|
||||||
|
* Score the result and decide the road (EASY / SHITTY / PATH_OF_PAIN)
|
||||||
|
via :mod:`alfred.domain.release.parser.scoring`.
|
||||||
|
|
||||||
|
The public entry point is :func:`parse_release`, which returns
|
||||||
|
``(ParsedRelease, ParseReport)``. The report carries the confidence
|
||||||
|
score, the road, and diagnostic info for downstream callers.
|
||||||
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import re
|
from .parser import pipeline as _v2
|
||||||
|
from .parser import scoring as _scoring
|
||||||
from .ports import ReleaseKnowledge
|
from .ports import ReleaseKnowledge
|
||||||
from .value_objects import MediaTypeToken, ParsedRelease, ParsePath
|
from .value_objects import MediaTypeToken, ParsedRelease, ParseReport, TokenizationRoute
|
||||||
|
|
||||||
|
|
||||||
def _tokenize(name: str, kb: ReleaseKnowledge) -> list[str]:
|
def parse_release(
|
||||||
"""Split a release name on the configured separators, dropping empty tokens."""
|
name: str, kb: ReleaseKnowledge
|
||||||
pattern = "[" + re.escape("".join(kb.separators)) + "]+"
|
) -> tuple[ParsedRelease, ParseReport]:
|
||||||
return [t for t in re.split(pattern, name) if t]
|
"""Parse a release name.
|
||||||
|
|
||||||
|
Returns a tuple ``(ParsedRelease, ParseReport)``. The structural VO
|
||||||
def parse_release(name: str, kb: ReleaseKnowledge) -> ParsedRelease:
|
is unchanged from the previous single-return contract; the report
|
||||||
"""
|
is new and carries the confidence score + road decision.
|
||||||
Parse a release name and return a ParsedRelease.
|
|
||||||
|
|
||||||
Flow:
|
Flow:
|
||||||
1. Strip a leading/trailing [site.tag] if present (sets parse_path="sanitized").
|
|
||||||
2. Check the remainder for truly forbidden chars (anything not in the
|
|
||||||
configured separators list). If any remain → media_type="unknown",
|
|
||||||
parse_path="ai", and the LLM handles it.
|
|
||||||
3. Tokenize using the configured separators (".", " ", "[", "]", "(", ")", "_", ...)
|
|
||||||
and run token-level matchers (season/episode, tech, languages, audio,
|
|
||||||
video, edition, title, year).
|
|
||||||
"""
|
|
||||||
parse_path = ParsePath.DIRECT.value
|
|
||||||
|
|
||||||
# Always try to extract a bracket-enclosed site tag first.
|
1. Strip a leading/trailing ``[site.tag]`` if present (sets
|
||||||
clean, site_tag = _strip_site_tag(name)
|
``parse_path="sanitized"``).
|
||||||
|
2. If the remainder still contains truly forbidden chars (anything
|
||||||
|
not in the configured separators), short-circuit to
|
||||||
|
``media_type="unknown"`` / ``parse_path="ai"`` and emit a
|
||||||
|
PATH_OF_PAIN report — the LLM handles these.
|
||||||
|
3. Otherwise run the v2 pipeline: tokenize → annotate (EASY when a
|
||||||
|
group schema is known, SHITTY otherwise) → assemble → score.
|
||||||
|
"""
|
||||||
|
parse_path = TokenizationRoute.DIRECT
|
||||||
|
|
||||||
|
# Apostrophes inside titles ("Don't", "L'avare") are common and should
|
||||||
|
# not push the release through the AI fallback. Strip them up front so
|
||||||
|
# both strip_site_tag and tokenize see "Dont" / "Lavare", which is good
|
||||||
|
# enough for token-level matching. The raw name is preserved on the VO.
|
||||||
|
working_name = name
|
||||||
|
if "'" in working_name:
|
||||||
|
working_name = working_name.replace("'", "")
|
||||||
|
parse_path = TokenizationRoute.SANITIZED
|
||||||
|
|
||||||
|
clean, site_tag = _v2.strip_site_tag(working_name)
|
||||||
if site_tag is not None:
|
if site_tag is not None:
|
||||||
parse_path = ParsePath.SANITIZED.value
|
parse_path = TokenizationRoute.SANITIZED
|
||||||
|
|
||||||
if not _is_well_formed(clean, kb):
|
if not _is_well_formed(clean, kb):
|
||||||
return ParsedRelease(
|
parsed = ParsedRelease(
|
||||||
raw=name,
|
raw=name,
|
||||||
normalised=clean,
|
clean=clean,
|
||||||
title=clean,
|
title=clean,
|
||||||
title_sanitized=kb.sanitize_for_fs(clean),
|
title_sanitized=kb.sanitize_for_fs(clean),
|
||||||
year=None,
|
year=None,
|
||||||
@@ -48,459 +73,49 @@ def parse_release(name: str, kb: ReleaseKnowledge) -> ParsedRelease:
|
|||||||
source=None,
|
source=None,
|
||||||
codec=None,
|
codec=None,
|
||||||
group="UNKNOWN",
|
group="UNKNOWN",
|
||||||
tech_string="",
|
media_type=MediaTypeToken.UNKNOWN,
|
||||||
media_type=MediaTypeToken.UNKNOWN.value,
|
|
||||||
site_tag=site_tag,
|
site_tag=site_tag,
|
||||||
parse_path=ParsePath.AI.value,
|
parse_path=TokenizationRoute.AI,
|
||||||
)
|
)
|
||||||
|
report = ParseReport(
|
||||||
name = clean
|
confidence=0,
|
||||||
tokens = _tokenize(name, kb)
|
road=_scoring.Road.PATH_OF_PAIN.value,
|
||||||
|
unknown_tokens=(clean,),
|
||||||
season, episode, episode_end = _extract_season_episode(tokens)
|
missing_critical=("title", "media_type", "year"),
|
||||||
quality, source, codec, group, tech_tokens = _extract_tech(tokens, kb)
|
|
||||||
languages, lang_tokens = _extract_languages(tokens, kb)
|
|
||||||
audio_codec, audio_channels, audio_tokens = _extract_audio(tokens, kb)
|
|
||||||
bit_depth, hdr_format, video_tokens = _extract_video_meta(tokens, kb)
|
|
||||||
edition, edition_tokens = _extract_edition(tokens, kb)
|
|
||||||
title = _extract_title(
|
|
||||||
tokens,
|
|
||||||
tech_tokens | lang_tokens | audio_tokens | video_tokens | edition_tokens,
|
|
||||||
kb,
|
|
||||||
)
|
|
||||||
year = _extract_year(tokens, title)
|
|
||||||
media_type = _infer_media_type(
|
|
||||||
season, quality, source, codec, year, edition, tokens, kb
|
|
||||||
)
|
)
|
||||||
|
return parsed, report
|
||||||
|
|
||||||
tech_parts = [p for p in [quality, source, codec] if p]
|
tokens, v2_tag = _v2.tokenize(working_name, kb)
|
||||||
tech_string = ".".join(tech_parts)
|
annotated = _v2.annotate(tokens, kb)
|
||||||
|
fields = _v2.assemble(annotated, v2_tag, name, kb)
|
||||||
|
|
||||||
return ParsedRelease(
|
parsed = ParsedRelease(
|
||||||
raw=name,
|
raw=name,
|
||||||
normalised=name,
|
clean=clean,
|
||||||
title=title,
|
|
||||||
title_sanitized=kb.sanitize_for_fs(title),
|
|
||||||
year=year,
|
|
||||||
season=season,
|
|
||||||
episode=episode,
|
|
||||||
episode_end=episode_end,
|
|
||||||
quality=quality,
|
|
||||||
source=source,
|
|
||||||
codec=codec,
|
|
||||||
group=group,
|
|
||||||
tech_string=tech_string,
|
|
||||||
media_type=media_type,
|
|
||||||
site_tag=site_tag,
|
|
||||||
parse_path=parse_path,
|
parse_path=parse_path,
|
||||||
languages=languages,
|
**fields,
|
||||||
audio_codec=audio_codec,
|
|
||||||
audio_channels=audio_channels,
|
|
||||||
bit_depth=bit_depth,
|
|
||||||
hdr_format=hdr_format,
|
|
||||||
edition=edition,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
has_schema = _v2.has_known_schema(tokens, kb)
|
||||||
def _infer_media_type(
|
score = _scoring.compute_score(parsed, annotated, kb)
|
||||||
season: int | None,
|
road = _scoring.decide_road(score, has_schema, kb)
|
||||||
quality: str | None,
|
report = ParseReport(
|
||||||
source: str | None,
|
confidence=score,
|
||||||
codec: str | None,
|
road=road.value,
|
||||||
year: int | None,
|
unknown_tokens=_scoring.collect_unknown_tokens(annotated),
|
||||||
edition: str | None,
|
missing_critical=_scoring.collect_missing_critical(parsed),
|
||||||
tokens: list[str],
|
)
|
||||||
kb: ReleaseKnowledge,
|
return parsed, report
|
||||||
) -> str:
|
|
||||||
"""
|
|
||||||
Infer media_type from token-level evidence only (no filesystem access).
|
|
||||||
|
|
||||||
- documentary : DOC token present
|
|
||||||
- concert : CONCERT token present
|
|
||||||
- tv_complete : INTEGRALE/COMPLETE token, no season
|
|
||||||
- tv_show : season token found
|
|
||||||
- movie : no season, at least one tech marker
|
|
||||||
- unknown : no conclusive evidence
|
|
||||||
"""
|
|
||||||
upper_tokens = {t.upper() for t in tokens}
|
|
||||||
|
|
||||||
doc_tokens = {t.upper() for t in kb.media_type_tokens.get("doc", [])}
|
|
||||||
concert_tokens = {t.upper() for t in kb.media_type_tokens.get("concert", [])}
|
|
||||||
integrale_tokens = {t.upper() for t in kb.media_type_tokens.get("integrale", [])}
|
|
||||||
|
|
||||||
if upper_tokens & doc_tokens:
|
|
||||||
return MediaTypeToken.DOCUMENTARY.value
|
|
||||||
if upper_tokens & concert_tokens:
|
|
||||||
return MediaTypeToken.CONCERT.value
|
|
||||||
if (
|
|
||||||
edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
|
|
||||||
or upper_tokens & integrale_tokens
|
|
||||||
) and season is None:
|
|
||||||
return MediaTypeToken.TV_COMPLETE.value
|
|
||||||
if season is not None:
|
|
||||||
return MediaTypeToken.TV_SHOW.value
|
|
||||||
if any([quality, source, codec, year]):
|
|
||||||
return MediaTypeToken.MOVIE.value
|
|
||||||
return MediaTypeToken.UNKNOWN.value
|
|
||||||
|
|
||||||
|
|
||||||
def _is_well_formed(name: str, kb: ReleaseKnowledge) -> bool:
|
def _is_well_formed(name: str, kb: ReleaseKnowledge) -> bool:
|
||||||
"""Return True if name contains no forbidden characters per scene naming rules.
|
"""Return True if ``name`` contains no forbidden characters per scene
|
||||||
|
naming rules.
|
||||||
|
|
||||||
Characters listed as token separators (spaces, brackets, parens, …) are NOT
|
Characters listed as token separators (spaces, brackets, parens, …)
|
||||||
considered malforming — the tokenizer handles them. Only truly broken chars
|
are NOT considered malforming — the tokenizer handles them. Only
|
||||||
like '@', '#', '!', '%' make a name malformed.
|
truly broken chars like ``@``, ``#``, ``!``, ``%`` make a name
|
||||||
|
malformed.
|
||||||
"""
|
"""
|
||||||
tokenizable = set(kb.separators)
|
tokenizable = set(kb.separators)
|
||||||
return not any(c in name for c in kb.forbidden_chars if c not in tokenizable)
|
return not any(c in name for c in kb.forbidden_chars if c not in tokenizable)
|
||||||
|
|
||||||
|
|
||||||
def _strip_site_tag(name: str) -> tuple[str, str | None]:
|
|
||||||
"""
|
|
||||||
Strip a site watermark tag from the release name and return (clean_name, tag).
|
|
||||||
|
|
||||||
Handles two positions:
|
|
||||||
- Prefix: "[ OxTorrent.vc ] The.Title.S01..."
|
|
||||||
- Suffix: "The.Title.S01...-NTb[TGx]"
|
|
||||||
|
|
||||||
Anything between [...] is treated as a site tag.
|
|
||||||
Returns (original_name, None) if no tag found.
|
|
||||||
"""
|
|
||||||
s = name.strip()
|
|
||||||
|
|
||||||
if s.startswith("["):
|
|
||||||
close = s.find("]")
|
|
||||||
if close != -1:
|
|
||||||
tag = s[1:close].strip()
|
|
||||||
remainder = s[close + 1 :].strip()
|
|
||||||
if tag and remainder:
|
|
||||||
return remainder, tag
|
|
||||||
|
|
||||||
if s.endswith("]"):
|
|
||||||
open_bracket = s.rfind("[")
|
|
||||||
if open_bracket != -1:
|
|
||||||
tag = s[open_bracket + 1 : -1].strip()
|
|
||||||
remainder = s[:open_bracket].strip()
|
|
||||||
if tag and remainder:
|
|
||||||
return remainder, tag
|
|
||||||
|
|
||||||
return s, None
|
|
||||||
|
|
||||||
|
|
||||||
def _parse_season_episode(tok: str) -> tuple[int, int | None, int | None] | None:
|
|
||||||
"""
|
|
||||||
Parse a single token as a season/episode marker.
|
|
||||||
|
|
||||||
Handles:
|
|
||||||
- SxxExx / SxxExxExx / Sxx (canonical scene form)
|
|
||||||
- NxNN / NxNNxNN (alt form: 1x05, 12x07x08)
|
|
||||||
|
|
||||||
Returns (season, episode, episode_end) or None if not a season token.
|
|
||||||
"""
|
|
||||||
upper = tok.upper()
|
|
||||||
|
|
||||||
# SxxExx form
|
|
||||||
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
|
|
||||||
season = int(upper[1:3])
|
|
||||||
rest = upper[3:]
|
|
||||||
|
|
||||||
if not rest:
|
|
||||||
return season, None, None
|
|
||||||
|
|
||||||
episodes: list[int] = []
|
|
||||||
while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
|
|
||||||
episodes.append(int(rest[1:3]))
|
|
||||||
rest = rest[3:]
|
|
||||||
|
|
||||||
if not episodes:
|
|
||||||
return None # malformed token like "S03XYZ"
|
|
||||||
|
|
||||||
return season, episodes[0], episodes[1] if len(episodes) >= 2 else None
|
|
||||||
|
|
||||||
# NxNN form — split on "X" (uppercased), all parts must be digits
|
|
||||||
if "X" in upper:
|
|
||||||
parts = upper.split("X")
|
|
||||||
if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
|
|
||||||
season = int(parts[0])
|
|
||||||
episode = int(parts[1])
|
|
||||||
episode_end = int(parts[2]) if len(parts) >= 3 else None
|
|
||||||
return season, episode, episode_end
|
|
||||||
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_season_episode(
|
|
||||||
tokens: list[str],
|
|
||||||
) -> tuple[int | None, int | None, int | None]:
|
|
||||||
for tok in tokens:
|
|
||||||
parsed = _parse_season_episode(tok)
|
|
||||||
if parsed is not None:
|
|
||||||
return parsed
|
|
||||||
return None, None, None
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_tech(
|
|
||||||
tokens: list[str],
|
|
||||||
kb: ReleaseKnowledge,
|
|
||||||
) -> tuple[str | None, str | None, str | None, str, set[str]]:
|
|
||||||
"""
|
|
||||||
Extract quality, source, codec, group from tokens.
|
|
||||||
|
|
||||||
Returns (quality, source, codec, group, tech_token_set).
|
|
||||||
|
|
||||||
Group extraction strategy (in priority order):
|
|
||||||
1. Token where prefix is a known codec: x265-GROUP
|
|
||||||
2. Rightmost token with a dash that isn't a known source
|
|
||||||
"""
|
|
||||||
quality: str | None = None
|
|
||||||
source: str | None = None
|
|
||||||
codec: str | None = None
|
|
||||||
group = "UNKNOWN"
|
|
||||||
tech_tokens: set[str] = set()
|
|
||||||
|
|
||||||
for tok in tokens:
|
|
||||||
tl = tok.lower()
|
|
||||||
|
|
||||||
if tl in kb.resolutions:
|
|
||||||
quality = tok
|
|
||||||
tech_tokens.add(tok)
|
|
||||||
continue
|
|
||||||
|
|
||||||
if tl in kb.sources:
|
|
||||||
source = tok
|
|
||||||
tech_tokens.add(tok)
|
|
||||||
continue
|
|
||||||
|
|
||||||
if "-" in tok:
|
|
||||||
parts = tok.rsplit("-", 1)
|
|
||||||
# codec-GROUP (highest priority for group)
|
|
||||||
if parts[0].lower() in kb.codecs:
|
|
||||||
codec = parts[0]
|
|
||||||
group = parts[1] if parts[1] else "UNKNOWN"
|
|
||||||
tech_tokens.add(tok)
|
|
||||||
continue
|
|
||||||
# source with dash: Web-DL, WEB-DL, etc.
|
|
||||||
if parts[0].lower() in kb.sources or tok.lower().replace("-", "") in kb.sources:
|
|
||||||
source = tok
|
|
||||||
tech_tokens.add(tok)
|
|
||||||
continue
|
|
||||||
|
|
||||||
if tl in kb.codecs:
|
|
||||||
codec = tok
|
|
||||||
tech_tokens.add(tok)
|
|
||||||
|
|
||||||
# Fallback: rightmost token with a dash that isn't a known source
|
|
||||||
if group == "UNKNOWN":
|
|
||||||
for tok in reversed(tokens):
|
|
||||||
if "-" in tok:
|
|
||||||
parts = tok.rsplit("-", 1)
|
|
||||||
tl = tok.lower()
|
|
||||||
if tl in kb.sources or tok.lower().replace("-", "") in kb.sources:
|
|
||||||
continue
|
|
||||||
if parts[1]:
|
|
||||||
group = parts[1]
|
|
||||||
break
|
|
||||||
|
|
||||||
return quality, source, codec, group, tech_tokens
|
|
||||||
|
|
||||||
|
|
||||||
def _is_year_token(tok: str) -> bool:
|
|
||||||
"""Return True if tok is a 4-digit year between 1900 and 2099."""
|
|
||||||
return len(tok) == 4 and tok.isdigit() and 1900 <= int(tok) <= 2099
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_title(
|
|
||||||
tokens: list[str], tech_tokens: set[str], kb: ReleaseKnowledge
|
|
||||||
) -> str:
|
|
||||||
"""Extract the title portion: everything before the first season/year/tech token."""
|
|
||||||
title_parts = []
|
|
||||||
known_tech = kb.resolutions | kb.sources | kb.codecs
|
|
||||||
for tok in tokens:
|
|
||||||
if _parse_season_episode(tok) is not None:
|
|
||||||
break
|
|
||||||
if _is_year_token(tok):
|
|
||||||
break
|
|
||||||
if tok in tech_tokens or tok.lower() in known_tech:
|
|
||||||
break
|
|
||||||
if "-" in tok and any(p.lower() in kb.codecs | kb.sources for p in tok.split("-")):
|
|
||||||
break
|
|
||||||
title_parts.append(tok)
|
|
||||||
|
|
||||||
return ".".join(title_parts) if title_parts else tokens[0]
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_year(tokens: list[str], title: str) -> int | None:
|
|
||||||
"""Extract a 4-digit year from tokens (only after the title)."""
|
|
||||||
title_len = len(title.split("."))
|
|
||||||
for tok in tokens[title_len:]:
|
|
||||||
if _is_year_token(tok):
|
|
||||||
return int(tok)
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Sequence matcher
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
def _match_sequences(
|
|
||||||
tokens: list[str],
|
|
||||||
sequences: list[dict],
|
|
||||||
key: str,
|
|
||||||
) -> tuple[str | None, set[str]]:
|
|
||||||
"""
|
|
||||||
Try to match multi-token sequences against consecutive tokens.
|
|
||||||
|
|
||||||
Returns (matched_value, set_of_matched_tokens) or (None, empty_set).
|
|
||||||
Sequences must be ordered most-specific first in the YAML.
|
|
||||||
"""
|
|
||||||
upper_tokens = [t.upper() for t in tokens]
|
|
||||||
for seq in sequences:
|
|
||||||
seq_upper = [s.upper() for s in seq["tokens"]]
|
|
||||||
n = len(seq_upper)
|
|
||||||
for i in range(len(upper_tokens) - n + 1):
|
|
||||||
if upper_tokens[i : i + n] == seq_upper:
|
|
||||||
matched = set(tokens[i : i + n])
|
|
||||||
return seq[key], matched
|
|
||||||
return None, set()
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Language extraction
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_languages(
|
|
||||||
tokens: list[str], kb: ReleaseKnowledge
|
|
||||||
) -> tuple[list[str], set[str]]:
|
|
||||||
"""Extract language tokens. Returns (languages, matched_token_set)."""
|
|
||||||
languages = []
|
|
||||||
lang_tokens: set[str] = set()
|
|
||||||
for tok in tokens:
|
|
||||||
if tok.upper() in kb.language_tokens:
|
|
||||||
languages.append(tok.upper())
|
|
||||||
lang_tokens.add(tok)
|
|
||||||
return languages, lang_tokens
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Audio extraction
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_audio(
|
|
||||||
tokens: list[str], kb: ReleaseKnowledge,
|
|
||||||
) -> tuple[str | None, str | None, set[str]]:
|
|
||||||
"""
|
|
||||||
Extract audio codec and channel layout.
|
|
||||||
|
|
||||||
Returns (audio_codec, audio_channels, matched_token_set).
|
|
||||||
Sequences are tried first (DTS.HD.MA, TrueHD.Atmos, …), then single tokens.
|
|
||||||
"""
|
|
||||||
audio_codec: str | None = None
|
|
||||||
audio_channels: str | None = None
|
|
||||||
audio_tokens: set[str] = set()
|
|
||||||
|
|
||||||
known_codecs = {c.upper() for c in kb.audio.get("codecs", [])}
|
|
||||||
known_channels = set(kb.audio.get("channels", []))
|
|
||||||
|
|
||||||
# Try multi-token sequences first
|
|
||||||
matched_codec, matched_set = _match_sequences(
|
|
||||||
tokens, kb.audio.get("sequences", []), "codec"
|
|
||||||
)
|
|
||||||
if matched_codec:
|
|
||||||
audio_codec = matched_codec
|
|
||||||
audio_tokens |= matched_set
|
|
||||||
|
|
||||||
# Channel layouts like "5.1" or "7.1" are split into two tokens by normalize —
|
|
||||||
# detect them as consecutive pairs "X" + "Y" where "X.Y" is a known channel.
|
|
||||||
# The second token may have a "-GROUP" suffix (e.g. "1-KTH" → strip it).
|
|
||||||
for i in range(len(tokens) - 1):
|
|
||||||
second = tokens[i + 1].split("-")[0]
|
|
||||||
candidate = f"{tokens[i]}.{second}"
|
|
||||||
if candidate in known_channels and audio_channels is None:
|
|
||||||
audio_channels = candidate
|
|
||||||
audio_tokens.add(tokens[i])
|
|
||||||
audio_tokens.add(tokens[i + 1])
|
|
||||||
|
|
||||||
for tok in tokens:
|
|
||||||
if tok in audio_tokens:
|
|
||||||
continue
|
|
||||||
if tok.upper() in known_codecs and audio_codec is None:
|
|
||||||
audio_codec = tok
|
|
||||||
audio_tokens.add(tok)
|
|
||||||
elif tok in known_channels and audio_channels is None:
|
|
||||||
audio_channels = tok
|
|
||||||
audio_tokens.add(tok)
|
|
||||||
|
|
||||||
return audio_codec, audio_channels, audio_tokens
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Video metadata extraction (bit depth, HDR)
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_video_meta(
|
|
||||||
tokens: list[str], kb: ReleaseKnowledge,
|
|
||||||
) -> tuple[str | None, str | None, set[str]]:
|
|
||||||
"""
|
|
||||||
Extract bit depth and HDR format.
|
|
||||||
|
|
||||||
Returns (bit_depth, hdr_format, matched_token_set).
|
|
||||||
"""
|
|
||||||
bit_depth: str | None = None
|
|
||||||
hdr_format: str | None = None
|
|
||||||
video_tokens: set[str] = set()
|
|
||||||
|
|
||||||
known_hdr = {h.upper() for h in kb.video_meta.get("hdr", [])} | kb.hdr_extra
|
|
||||||
known_depth = {d.lower() for d in kb.video_meta.get("bit_depth", [])}
|
|
||||||
|
|
||||||
# Try HDR sequences first
|
|
||||||
matched_hdr, matched_set = _match_sequences(
|
|
||||||
tokens, kb.video_meta.get("sequences", []), "hdr"
|
|
||||||
)
|
|
||||||
if matched_hdr:
|
|
||||||
hdr_format = matched_hdr
|
|
||||||
video_tokens |= matched_set
|
|
||||||
|
|
||||||
for tok in tokens:
|
|
||||||
if tok in video_tokens:
|
|
||||||
continue
|
|
||||||
if tok.upper() in known_hdr and hdr_format is None:
|
|
||||||
hdr_format = tok.upper()
|
|
||||||
video_tokens.add(tok)
|
|
||||||
elif tok.lower() in known_depth and bit_depth is None:
|
|
||||||
bit_depth = tok.lower()
|
|
||||||
video_tokens.add(tok)
|
|
||||||
|
|
||||||
return bit_depth, hdr_format, video_tokens
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Edition extraction
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_edition(
|
|
||||||
tokens: list[str], kb: ReleaseKnowledge
|
|
||||||
) -> tuple[str | None, set[str]]:
|
|
||||||
"""
|
|
||||||
Extract release edition (UNRATED, EXTENDED, DIRECTORS.CUT, …).
|
|
||||||
|
|
||||||
Returns (edition, matched_token_set).
|
|
||||||
"""
|
|
||||||
known_tokens = {t.upper() for t in kb.editions.get("tokens", [])}
|
|
||||||
|
|
||||||
# Try multi-token sequences first
|
|
||||||
matched_edition, matched_set = _match_sequences(
|
|
||||||
tokens, kb.editions.get("sequences", []), "edition"
|
|
||||||
)
|
|
||||||
if matched_edition:
|
|
||||||
return matched_edition, matched_set
|
|
||||||
|
|
||||||
for tok in tokens:
|
|
||||||
if tok.upper() in known_tokens:
|
|
||||||
return tok.upper(), {tok}
|
|
||||||
|
|
||||||
return None, set()
|
|
||||||
|
|||||||
@@ -15,7 +15,7 @@ calling ``kb.sanitize_for_fs(tmdb_title)`` before invoking the builders.
|
|||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
from dataclasses import dataclass, field
|
from dataclasses import dataclass
|
||||||
from enum import Enum
|
from enum import Enum
|
||||||
|
|
||||||
from ..shared.exceptions import ValidationError
|
from ..shared.exceptions import ValidationError
|
||||||
@@ -40,19 +40,27 @@ class MediaTypeToken(str, Enum):
|
|||||||
UNKNOWN = "unknown"
|
UNKNOWN = "unknown"
|
||||||
|
|
||||||
|
|
||||||
class ParsePath(str, Enum):
|
class TokenizationRoute(str, Enum):
|
||||||
"""How a ``ParsedRelease`` was produced. ``str``-backed for the same
|
"""How a ``ParsedRelease`` was produced.
|
||||||
reasons as :class:`MediaTypeToken`."""
|
|
||||||
|
Records the **tokenization route** — i.e. whether the release name
|
||||||
|
was tokenized as-is (``DIRECT``), after a sanitization pass like
|
||||||
|
site-tag stripping or apostrophe removal (``SANITIZED``), or whether
|
||||||
|
structural parsing failed and an LLM rebuild is needed (``AI``).
|
||||||
|
|
||||||
|
This is **orthogonal** to :class:`~alfred.domain.release.parser.scoring.Road`
|
||||||
|
(EASY / SHITTY / PATH_OF_PAIN), which captures parser confidence and
|
||||||
|
is recorded on :class:`ParseReport`. Both can vary independently —
|
||||||
|
a SANITIZED name can still land on the EASY road if a group schema
|
||||||
|
matches the tokens after stripping.
|
||||||
|
|
||||||
|
``str``-backed for the same reasons as :class:`MediaTypeToken`."""
|
||||||
|
|
||||||
DIRECT = "direct"
|
DIRECT = "direct"
|
||||||
SANITIZED = "sanitized"
|
SANITIZED = "sanitized"
|
||||||
AI = "ai"
|
AI = "ai"
|
||||||
|
|
||||||
|
|
||||||
_VALID_MEDIA_TYPES: frozenset[str] = frozenset(m.value for m in MediaTypeToken)
|
|
||||||
_VALID_PARSE_PATHS: frozenset[str] = frozenset(p.value for p in ParsePath)
|
|
||||||
|
|
||||||
|
|
||||||
def _strip_episode_from_normalized(normalized: str) -> str:
|
def _strip_episode_from_normalized(normalized: str) -> str:
|
||||||
"""
|
"""
|
||||||
Remove all episode parts (Exx) from a normalized release name, keeping Sxx.
|
Remove all episode parts (Exx) from a normalized release name, keeping Sxx.
|
||||||
@@ -72,17 +80,55 @@ def _strip_episode_from_normalized(normalized: str) -> str:
|
|||||||
return ".".join(result)
|
return ".".join(result)
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass(frozen=True)
|
||||||
|
class ParseReport:
|
||||||
|
"""Diagnostic report attached to a :class:`ParsedRelease`.
|
||||||
|
|
||||||
|
``parse_release`` returns ``(ParsedRelease, ParseReport)``. The
|
||||||
|
report describes *how confident* the parser is in the result and
|
||||||
|
*which road* produced it. It is intentionally separate from
|
||||||
|
``ParsedRelease`` so the structural VO stays free of meta-concerns
|
||||||
|
about its own quality.
|
||||||
|
|
||||||
|
Fields:
|
||||||
|
|
||||||
|
- ``confidence``: integer 0–100 (see :func:`parser.scoring.compute_score`).
|
||||||
|
- ``road``: ``"easy"`` / ``"shitty"`` / ``"path_of_pain"`` — distinct
|
||||||
|
from ``ParsedRelease.parse_path`` (which describes the
|
||||||
|
tokenization route, not the confidence tier).
|
||||||
|
- ``unknown_tokens``: tokens that finished annotation with role
|
||||||
|
UNKNOWN, in order of appearance.
|
||||||
|
- ``missing_critical``: names of critical structural fields the
|
||||||
|
parser couldn't fill (subset of ``{"title", "media_type", "year"}``).
|
||||||
|
"""
|
||||||
|
|
||||||
|
confidence: int
|
||||||
|
road: str # one of parser.scoring.Road values
|
||||||
|
unknown_tokens: tuple[str, ...] = ()
|
||||||
|
missing_critical: tuple[str, ...] = ()
|
||||||
|
|
||||||
|
def __post_init__(self) -> None:
|
||||||
|
if not (0 <= self.confidence <= 100):
|
||||||
|
raise ValidationError(
|
||||||
|
f"ParseReport.confidence out of range: {self.confidence}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
class ParsedRelease:
|
class ParsedRelease:
|
||||||
"""Structured representation of a parsed release name.
|
"""Structured representation of a parsed release name.
|
||||||
|
|
||||||
``title_sanitized`` carries the filesystem-safe form of ``title`` (computed
|
``title_sanitized`` carries the filesystem-safe form of ``title`` (computed
|
||||||
by the parser at construction time using the injected knowledge base).
|
by the parser at construction time using the injected knowledge base).
|
||||||
Builder methods rely on it being already-sanitized — see module docstring.
|
Builder methods rely on it being already-sanitized — see module docstring.
|
||||||
|
|
||||||
|
Frozen: enrichment passes (``detect_media_type``, ``enrich_from_probe``)
|
||||||
|
return a **new** ``ParsedRelease`` via ``dataclasses.replace`` rather
|
||||||
|
than mutating in place. ``languages`` is a tuple for the same reason.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
raw: str # original release name (untouched)
|
raw: str # original release name (untouched)
|
||||||
normalised: str # dots instead of spaces
|
clean: str # raw minus site_tag and apostrophes — used by season_folder_name()
|
||||||
title: str # show/movie title (dots, no year/season/tech)
|
title: str # show/movie title (dots, no year/season/tech)
|
||||||
title_sanitized: str # title with filesystem-forbidden chars stripped
|
title_sanitized: str # title with filesystem-forbidden chars stripped
|
||||||
year: int | None # movie year or show start year (from TMDB)
|
year: int | None # movie year or show start year (from TMDB)
|
||||||
@@ -93,18 +139,18 @@ class ParsedRelease:
|
|||||||
source: str | None # WEBRip, BluRay, …
|
source: str | None # WEBRip, BluRay, …
|
||||||
codec: str | None # x265, HEVC, …
|
codec: str | None # x265, HEVC, …
|
||||||
group: str # release group, "UNKNOWN" if missing
|
group: str # release group, "UNKNOWN" if missing
|
||||||
tech_string: str # quality.source.codec joined with dots
|
|
||||||
media_type: MediaTypeToken = MediaTypeToken.UNKNOWN
|
media_type: MediaTypeToken = MediaTypeToken.UNKNOWN
|
||||||
site_tag: str | None = (
|
site_tag: str | None = (
|
||||||
None # site watermark stripped from name, e.g. "TGx", "OxTorrent.vc"
|
None # site watermark stripped from name, e.g. "TGx", "OxTorrent.vc"
|
||||||
)
|
)
|
||||||
parse_path: ParsePath = ParsePath.DIRECT
|
parse_path: TokenizationRoute = TokenizationRoute.DIRECT
|
||||||
languages: list[str] = field(default_factory=list) # ["MULTI", "VFF"], ["FRENCH"], …
|
languages: tuple[str, ...] = () # ("MULTI", "VFF"), ("FRENCH",), …
|
||||||
audio_codec: str | None = None # "DTS-HD.MA", "DDP", "EAC3", …
|
audio_codec: str | None = None # "DTS-HD.MA", "DDP", "EAC3", …
|
||||||
audio_channels: str | None = None # "5.1", "7.1", "2.0", …
|
audio_channels: str | None = None # "5.1", "7.1", "2.0", …
|
||||||
bit_depth: str | None = None # "10bit", "8bit", …
|
bit_depth: str | None = None # "10bit", "8bit", …
|
||||||
hdr_format: str | None = None # "DV", "HDR10", "DV.HDR10", …
|
hdr_format: str | None = None # "DV", "HDR10", "DV.HDR10", …
|
||||||
edition: str | None = None # "UNRATED", "EXTENDED", "DIRECTORS.CUT", …
|
edition: str | None = None # "UNRATED", "EXTENDED", "DIRECTORS.CUT", …
|
||||||
|
distributor: str | None = None # "NF", "AMZN", "DSNP", … (streaming origin)
|
||||||
|
|
||||||
def __post_init__(self) -> None:
|
def __post_init__(self) -> None:
|
||||||
if not self.raw:
|
if not self.raw:
|
||||||
@@ -133,28 +179,30 @@ class ParsedRelease:
|
|||||||
f"ParsedRelease.episode_end ({self.episode_end}) < "
|
f"ParsedRelease.episode_end ({self.episode_end}) < "
|
||||||
f"episode ({self.episode})"
|
f"episode ({self.episode})"
|
||||||
)
|
)
|
||||||
# Coerce raw strings into their enum form (tolerant constructor).
|
|
||||||
if not isinstance(self.media_type, MediaTypeToken):
|
if not isinstance(self.media_type, MediaTypeToken):
|
||||||
try:
|
|
||||||
self.media_type = MediaTypeToken(self.media_type)
|
|
||||||
except ValueError:
|
|
||||||
raise ValidationError(
|
raise ValidationError(
|
||||||
f"ParsedRelease.media_type invalid: {self.media_type!r} "
|
f"ParsedRelease.media_type must be a MediaTypeToken, "
|
||||||
f"(expected one of {sorted(_VALID_MEDIA_TYPES)})"
|
f"got {type(self.media_type).__name__}: {self.media_type!r}"
|
||||||
) from None
|
)
|
||||||
if not isinstance(self.parse_path, ParsePath):
|
if not isinstance(self.parse_path, TokenizationRoute):
|
||||||
try:
|
|
||||||
self.parse_path = ParsePath(self.parse_path)
|
|
||||||
except ValueError:
|
|
||||||
raise ValidationError(
|
raise ValidationError(
|
||||||
f"ParsedRelease.parse_path invalid: {self.parse_path!r} "
|
f"ParsedRelease.parse_path must be a TokenizationRoute, "
|
||||||
f"(expected one of {sorted(_VALID_PARSE_PATHS)})"
|
f"got {type(self.parse_path).__name__}: {self.parse_path!r}"
|
||||||
) from None
|
)
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def is_season_pack(self) -> bool:
|
def is_season_pack(self) -> bool:
|
||||||
return self.season is not None and self.episode is None
|
return self.season is not None and self.episode is None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def tech_string(self) -> str:
|
||||||
|
"""``quality.source.codec`` joined by dots, skipping ``None`` parts.
|
||||||
|
|
||||||
|
Derived on every access so it stays in sync with the underlying
|
||||||
|
fields — no manual refresh needed after enrichment.
|
||||||
|
"""
|
||||||
|
return ".".join(p for p in (self.quality, self.source, self.codec) if p)
|
||||||
|
|
||||||
def show_folder_name(self, tmdb_title_safe: str, tmdb_year: int) -> str:
|
def show_folder_name(self, tmdb_title_safe: str, tmdb_year: int) -> str:
|
||||||
"""
|
"""
|
||||||
Build the series root folder name.
|
Build the series root folder name.
|
||||||
@@ -177,7 +225,7 @@ class ParsedRelease:
|
|||||||
For a single-episode release we still strip the episode token so the
|
For a single-episode release we still strip the episode token so the
|
||||||
folder can hold the whole season.
|
folder can hold the whole season.
|
||||||
"""
|
"""
|
||||||
return _strip_episode_from_normalized(self.normalised)
|
return _strip_episode_from_normalized(self.clean)
|
||||||
|
|
||||||
def episode_filename(self, tmdb_episode_title_safe: str | None, ext: str) -> str:
|
def episode_filename(self, tmdb_episode_title_safe: str | None, ext: str) -> str:
|
||||||
"""
|
"""
|
||||||
|
|||||||
@@ -0,0 +1,267 @@
|
|||||||
|
"""Media — file-level track types (video/audio/subtitle) and MediaInfo container.
|
||||||
|
|
||||||
|
These are the **container-view** dataclasses, populated from ffprobe output and
|
||||||
|
used across the project to describe the content of a media file.
|
||||||
|
|
||||||
|
Not to be confused with ``alfred.domain.subtitles.entities.SubtitleScanResult``
|
||||||
|
which models a subtitle being **scanned/matched** (with confidence, raw tokens,
|
||||||
|
file path, etc.). The two coexist by design — they describe the same real-world
|
||||||
|
concept seen from two different bounded contexts.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
|
||||||
|
from .value_objects import Language
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"AudioTrack",
|
||||||
|
"MediaInfo",
|
||||||
|
"MediaWithTracks",
|
||||||
|
"SubtitleTrack",
|
||||||
|
"VideoTrack",
|
||||||
|
"track_lang_matches",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Track types — one frozen dataclass per stream kind
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class AudioTrack:
|
||||||
|
"""A single audio track as reported by ffprobe."""
|
||||||
|
|
||||||
|
index: int
|
||||||
|
codec: str | None # aac, ac3, eac3, dts, truehd, flac, …
|
||||||
|
channels: int | None # 2, 6 (5.1), 8 (7.1), …
|
||||||
|
channel_layout: str | None # stereo, 5.1, 7.1, …
|
||||||
|
language: str | None # ISO 639-2: fre, eng, und, …
|
||||||
|
is_default: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class SubtitleTrack:
|
||||||
|
"""A single embedded subtitle track as reported by ffprobe."""
|
||||||
|
|
||||||
|
index: int
|
||||||
|
codec: str | None # subrip, ass, hdmv_pgs_subtitle, …
|
||||||
|
language: str | None # ISO 639-2: fre, eng, und, …
|
||||||
|
is_default: bool = False
|
||||||
|
is_forced: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class VideoTrack:
|
||||||
|
"""A single video track as reported by ffprobe.
|
||||||
|
|
||||||
|
A media file typically has one video track but can have several (alt
|
||||||
|
camera angles, attached thumbnail images reported as still-image streams,
|
||||||
|
etc.), hence the list[VideoTrack] on MediaInfo.
|
||||||
|
"""
|
||||||
|
|
||||||
|
index: int
|
||||||
|
codec: str | None # h264, hevc, av1, …
|
||||||
|
width: int | None
|
||||||
|
height: int | None
|
||||||
|
is_default: bool = False
|
||||||
|
|
||||||
|
@property
|
||||||
|
def resolution(self) -> str | None:
|
||||||
|
"""
|
||||||
|
Best-effort resolution string: 2160p, 1080p, 720p, …
|
||||||
|
|
||||||
|
Width takes priority over height to handle widescreen/cinema crops
|
||||||
|
(e.g. 1920×960 scope → 1080p, not 720p). Falls back to height when
|
||||||
|
width is unavailable.
|
||||||
|
"""
|
||||||
|
match (self.width, self.height):
|
||||||
|
case (None, None):
|
||||||
|
return None
|
||||||
|
case (w, h) if w is not None:
|
||||||
|
match True:
|
||||||
|
case _ if w >= 3840:
|
||||||
|
return "2160p"
|
||||||
|
case _ if w >= 1920:
|
||||||
|
return "1080p"
|
||||||
|
case _ if w >= 1280:
|
||||||
|
return "720p"
|
||||||
|
case _ if w >= 720:
|
||||||
|
return "576p"
|
||||||
|
case _ if w >= 640:
|
||||||
|
return "480p"
|
||||||
|
case _:
|
||||||
|
return f"{h}p" if h else f"{w}w"
|
||||||
|
case (None, h):
|
||||||
|
match True:
|
||||||
|
case _ if h >= 2160:
|
||||||
|
return "2160p"
|
||||||
|
case _ if h >= 1080:
|
||||||
|
return "1080p"
|
||||||
|
case _ if h >= 720:
|
||||||
|
return "720p"
|
||||||
|
case _ if h >= 576:
|
||||||
|
return "576p"
|
||||||
|
case _ if h >= 480:
|
||||||
|
return "480p"
|
||||||
|
case _:
|
||||||
|
return f"{h}p"
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# MediaInfo — assembles video/audio/subtitle tracks for a media file
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class MediaInfo:
|
||||||
|
"""
|
||||||
|
File-level media metadata extracted by ffprobe — immutable snapshot.
|
||||||
|
|
||||||
|
Symmetric design: every stream type is a tuple of typed track objects
|
||||||
|
(immutable on purpose — a MediaInfo is a frozen view of one ffprobe run,
|
||||||
|
not a mutable collection to append to).
|
||||||
|
Backwards-compatible flat accessors (``resolution``, ``width``, …) read
|
||||||
|
from the first video track when present.
|
||||||
|
"""
|
||||||
|
|
||||||
|
video_tracks: tuple[VideoTrack, ...] = field(default_factory=tuple)
|
||||||
|
audio_tracks: tuple[AudioTrack, ...] = field(default_factory=tuple)
|
||||||
|
subtitle_tracks: tuple[SubtitleTrack, ...] = field(default_factory=tuple)
|
||||||
|
|
||||||
|
# File-level (from ffprobe ``format`` block, not from any single stream)
|
||||||
|
duration_seconds: float | None = None
|
||||||
|
bitrate_kbps: int | None = None
|
||||||
|
|
||||||
|
# ──────────────────────────────────────────────────────────────────────
|
||||||
|
# Video conveniences — read the first video track
|
||||||
|
# ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
@property
|
||||||
|
def primary_video(self) -> VideoTrack | None:
|
||||||
|
return self.video_tracks[0] if self.video_tracks else None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def width(self) -> int | None:
|
||||||
|
v = self.primary_video
|
||||||
|
return v.width if v else None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def height(self) -> int | None:
|
||||||
|
v = self.primary_video
|
||||||
|
return v.height if v else None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def video_codec(self) -> str | None:
|
||||||
|
v = self.primary_video
|
||||||
|
return v.codec if v else None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def resolution(self) -> str | None:
|
||||||
|
v = self.primary_video
|
||||||
|
return v.resolution if v else None
|
||||||
|
|
||||||
|
# ──────────────────────────────────────────────────────────────────────
|
||||||
|
# Audio conveniences
|
||||||
|
# ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
@property
|
||||||
|
def audio_languages(self) -> list[str]:
|
||||||
|
"""Unique audio languages across all tracks (ISO 639-2)."""
|
||||||
|
seen: set[str] = set()
|
||||||
|
result: list[str] = []
|
||||||
|
for track in self.audio_tracks:
|
||||||
|
if track.language and track.language not in seen:
|
||||||
|
seen.add(track.language)
|
||||||
|
result.append(track.language)
|
||||||
|
return result
|
||||||
|
|
||||||
|
@property
|
||||||
|
def is_multi_audio(self) -> bool:
|
||||||
|
"""True if more than one audio language is present."""
|
||||||
|
return len(self.audio_languages) > 1
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Language matching — shared helper + mixin
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def track_lang_matches(track_lang: str | None, query: str | Language) -> bool:
|
||||||
|
"""
|
||||||
|
Match a track's language string against a query (contract "C+").
|
||||||
|
|
||||||
|
* ``Language`` query → matches if the track string is any known
|
||||||
|
representation of that Language (delegates to ``Language.matches``).
|
||||||
|
Powerful, cross-format mode.
|
||||||
|
* ``str`` query → case-insensitive direct comparison against
|
||||||
|
``track_lang``. Simple, no normalization, no registry lookup.
|
||||||
|
|
||||||
|
Callers needing cross-format resolution (``"fr"`` ↔ ``"fre"`` ↔
|
||||||
|
``"french"``) should resolve their string through a ``LanguageRegistry``
|
||||||
|
once and pass the resulting ``Language``.
|
||||||
|
"""
|
||||||
|
if track_lang is None:
|
||||||
|
return False
|
||||||
|
if isinstance(query, Language):
|
||||||
|
return query.matches(track_lang)
|
||||||
|
if isinstance(query, str):
|
||||||
|
return track_lang.lower().strip() == query.lower().strip()
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
class MediaWithTracks:
|
||||||
|
"""
|
||||||
|
Mixin providing audio/subtitle helpers for entities with track collections.
|
||||||
|
|
||||||
|
Hosts must expose two attributes:
|
||||||
|
|
||||||
|
* ``audio_tracks: tuple[AudioTrack, ...]``
|
||||||
|
* ``subtitle_tracks: tuple[SubtitleTrack, ...]``
|
||||||
|
|
||||||
|
The helpers follow the "C+" matching contract: pass a :class:`Language`
|
||||||
|
for cross-format matching, or a ``str`` for case-insensitive comparison.
|
||||||
|
"""
|
||||||
|
|
||||||
|
# These attributes are provided by the host entity (Movie, Episode, …).
|
||||||
|
# Declared here only for type-checkers and to make the contract explicit.
|
||||||
|
audio_tracks: tuple[AudioTrack, ...]
|
||||||
|
subtitle_tracks: tuple[SubtitleTrack, ...]
|
||||||
|
|
||||||
|
# ── Audio helpers ──────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def has_audio_in(self, lang: str | Language) -> bool:
|
||||||
|
"""True if at least one audio track is in the given language."""
|
||||||
|
return any(track_lang_matches(t.language, lang) for t in self.audio_tracks)
|
||||||
|
|
||||||
|
def audio_languages(self) -> list[str]:
|
||||||
|
"""Unique audio languages across all tracks, in track order."""
|
||||||
|
seen: set[str] = set()
|
||||||
|
result: list[str] = []
|
||||||
|
for t in self.audio_tracks:
|
||||||
|
if t.language and t.language not in seen:
|
||||||
|
seen.add(t.language)
|
||||||
|
result.append(t.language)
|
||||||
|
return result
|
||||||
|
|
||||||
|
# ── Subtitle helpers ───────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def has_subtitles_in(self, lang: str | Language) -> bool:
|
||||||
|
"""True if at least one subtitle track is in the given language."""
|
||||||
|
return any(track_lang_matches(t.language, lang) for t in self.subtitle_tracks)
|
||||||
|
|
||||||
|
def has_forced_subs(self) -> bool:
|
||||||
|
"""True if at least one subtitle track is flagged as forced."""
|
||||||
|
return any(t.is_forced for t in self.subtitle_tracks)
|
||||||
|
|
||||||
|
def subtitle_languages(self) -> list[str]:
|
||||||
|
"""Unique subtitle languages across all tracks, in track order."""
|
||||||
|
seen: set[str] = set()
|
||||||
|
result: list[str] = []
|
||||||
|
for t in self.subtitle_tracks:
|
||||||
|
if t.language and t.language not in seen:
|
||||||
|
seen.add(t.language)
|
||||||
|
result.append(t.language)
|
||||||
|
return result
|
||||||
@@ -1,21 +0,0 @@
|
|||||||
"""Media — file-level track types (video/audio/subtitle) and MediaInfo container.
|
|
||||||
|
|
||||||
These are the **container-view** dataclasses, populated from ffprobe output and
|
|
||||||
used across the project to describe the content of a media file.
|
|
||||||
"""
|
|
||||||
|
|
||||||
from .audio import AudioTrack
|
|
||||||
from .info import MediaInfo
|
|
||||||
from .matching import track_lang_matches
|
|
||||||
from .subtitle import SubtitleTrack
|
|
||||||
from .tracks_mixin import MediaWithTracks
|
|
||||||
from .video import VideoTrack
|
|
||||||
|
|
||||||
__all__ = [
|
|
||||||
"AudioTrack",
|
|
||||||
"MediaInfo",
|
|
||||||
"MediaWithTracks",
|
|
||||||
"SubtitleTrack",
|
|
||||||
"VideoTrack",
|
|
||||||
"track_lang_matches",
|
|
||||||
]
|
|
||||||
@@ -1,17 +0,0 @@
|
|||||||
"""AudioTrack — a single audio stream as reported by ffprobe."""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from dataclasses import dataclass
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
|
||||||
class AudioTrack:
|
|
||||||
"""A single audio track as reported by ffprobe."""
|
|
||||||
|
|
||||||
index: int
|
|
||||||
codec: str | None # aac, ac3, eac3, dts, truehd, flac, …
|
|
||||||
channels: int | None # 2, 6 (5.1), 8 (7.1), …
|
|
||||||
channel_layout: str | None # stereo, 5.1, 7.1, …
|
|
||||||
language: str | None # ISO 639-2: fre, eng, und, …
|
|
||||||
is_default: bool = False
|
|
||||||
@@ -1,78 +0,0 @@
|
|||||||
"""MediaInfo — assembles video, audio and subtitle tracks for a media file."""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from dataclasses import dataclass, field
|
|
||||||
|
|
||||||
from .audio import AudioTrack
|
|
||||||
from .subtitle import SubtitleTrack
|
|
||||||
from .video import VideoTrack
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
|
||||||
class MediaInfo:
|
|
||||||
"""
|
|
||||||
File-level media metadata extracted by ffprobe — immutable snapshot.
|
|
||||||
|
|
||||||
Symmetric design: every stream type is a tuple of typed track objects
|
|
||||||
(immutable on purpose — a MediaInfo is a frozen view of one ffprobe run,
|
|
||||||
not a mutable collection to append to).
|
|
||||||
Backwards-compatible flat accessors (``resolution``, ``width``, …) read
|
|
||||||
from the first video track when present.
|
|
||||||
"""
|
|
||||||
|
|
||||||
video_tracks: tuple[VideoTrack, ...] = field(default_factory=tuple)
|
|
||||||
audio_tracks: tuple[AudioTrack, ...] = field(default_factory=tuple)
|
|
||||||
subtitle_tracks: tuple[SubtitleTrack, ...] = field(default_factory=tuple)
|
|
||||||
|
|
||||||
# File-level (from ffprobe ``format`` block, not from any single stream)
|
|
||||||
duration_seconds: float | None = None
|
|
||||||
bitrate_kbps: int | None = None
|
|
||||||
|
|
||||||
# ──────────────────────────────────────────────────────────────────────
|
|
||||||
# Video conveniences — read the first video track
|
|
||||||
# ──────────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@property
|
|
||||||
def primary_video(self) -> VideoTrack | None:
|
|
||||||
return self.video_tracks[0] if self.video_tracks else None
|
|
||||||
|
|
||||||
@property
|
|
||||||
def width(self) -> int | None:
|
|
||||||
v = self.primary_video
|
|
||||||
return v.width if v else None
|
|
||||||
|
|
||||||
@property
|
|
||||||
def height(self) -> int | None:
|
|
||||||
v = self.primary_video
|
|
||||||
return v.height if v else None
|
|
||||||
|
|
||||||
@property
|
|
||||||
def video_codec(self) -> str | None:
|
|
||||||
v = self.primary_video
|
|
||||||
return v.codec if v else None
|
|
||||||
|
|
||||||
@property
|
|
||||||
def resolution(self) -> str | None:
|
|
||||||
v = self.primary_video
|
|
||||||
return v.resolution if v else None
|
|
||||||
|
|
||||||
# ──────────────────────────────────────────────────────────────────────
|
|
||||||
# Audio conveniences
|
|
||||||
# ──────────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@property
|
|
||||||
def audio_languages(self) -> list[str]:
|
|
||||||
"""Unique audio languages across all tracks (ISO 639-2)."""
|
|
||||||
seen: set[str] = set()
|
|
||||||
result: list[str] = []
|
|
||||||
for track in self.audio_tracks:
|
|
||||||
if track.language and track.language not in seen:
|
|
||||||
seen.add(track.language)
|
|
||||||
result.append(track.language)
|
|
||||||
return result
|
|
||||||
|
|
||||||
@property
|
|
||||||
def is_multi_audio(self) -> bool:
|
|
||||||
"""True if more than one audio language is present."""
|
|
||||||
return len(self.audio_languages) > 1
|
|
||||||
@@ -1,33 +0,0 @@
|
|||||||
"""Language-matching helper shared by media-bearing entities.
|
|
||||||
|
|
||||||
Both ``Episode`` and ``Movie`` carry ``audio_tracks`` / ``subtitle_tracks`` and
|
|
||||||
need to answer "do I have audio in language X?". The matching contract is the
|
|
||||||
same in both cases — keep it in one place.
|
|
||||||
"""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from ..value_objects import Language
|
|
||||||
|
|
||||||
|
|
||||||
def track_lang_matches(track_lang: str | None, query: str | Language) -> bool:
|
|
||||||
"""
|
|
||||||
Match a track's language string against a query (contract "C+").
|
|
||||||
|
|
||||||
* ``Language`` query → matches if the track string is any known
|
|
||||||
representation of that Language (delegates to ``Language.matches``).
|
|
||||||
Powerful, cross-format mode.
|
|
||||||
* ``str`` query → case-insensitive direct comparison against
|
|
||||||
``track_lang``. Simple, no normalization, no registry lookup.
|
|
||||||
|
|
||||||
Callers needing cross-format resolution (``"fr"`` ↔ ``"fre"`` ↔
|
|
||||||
``"french"``) should resolve their string through a ``LanguageRegistry``
|
|
||||||
once and pass the resulting ``Language``.
|
|
||||||
"""
|
|
||||||
if track_lang is None:
|
|
||||||
return False
|
|
||||||
if isinstance(query, Language):
|
|
||||||
return query.matches(track_lang)
|
|
||||||
if isinstance(query, str):
|
|
||||||
return track_lang.lower().strip() == query.lower().strip()
|
|
||||||
return False
|
|
||||||
@@ -1,25 +0,0 @@
|
|||||||
"""SubtitleTrack — a single embedded subtitle stream as reported by ffprobe.
|
|
||||||
|
|
||||||
This is the **container-view** representation (ffprobe output) used uniformly
|
|
||||||
across the project to describe a subtitle stream embedded in a media file.
|
|
||||||
|
|
||||||
Not to be confused with ``alfred.domain.subtitles.entities.SubtitleCandidate``
|
|
||||||
which models a subtitle being **scanned/matched** (with confidence, raw tokens,
|
|
||||||
file path, etc.). The two coexist by design — they describe the same real-world
|
|
||||||
concept seen from two different bounded contexts.
|
|
||||||
"""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from dataclasses import dataclass
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
|
||||||
class SubtitleTrack:
|
|
||||||
"""A single embedded subtitle track as reported by ffprobe."""
|
|
||||||
|
|
||||||
index: int
|
|
||||||
codec: str | None # subrip, ass, hdmv_pgs_subtitle, …
|
|
||||||
language: str | None # ISO 639-2: fre, eng, und, …
|
|
||||||
is_default: bool = False
|
|
||||||
is_forced: bool = False
|
|
||||||
@@ -1,77 +0,0 @@
|
|||||||
"""Mixin shared by entities that carry audio + subtitle tracks.
|
|
||||||
|
|
||||||
Both ``Movie`` and ``Episode`` carry a ``list[AudioTrack]`` plus a
|
|
||||||
``list[SubtitleTrack]`` and answer the same 5 queries about them (language
|
|
||||||
presence, unique languages, forced flag). Keep that behavior in one place so a
|
|
||||||
fix in one is a fix in both.
|
|
||||||
|
|
||||||
The mixin is plain Python (no dataclass machinery) so it composes cleanly with
|
|
||||||
``@dataclass`` entities — it only reads ``self.audio_tracks`` and
|
|
||||||
``self.subtitle_tracks`` which the host class provides as fields.
|
|
||||||
"""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from typing import TYPE_CHECKING
|
|
||||||
|
|
||||||
from ..value_objects import Language
|
|
||||||
from .matching import track_lang_matches
|
|
||||||
|
|
||||||
if TYPE_CHECKING:
|
|
||||||
from .audio import AudioTrack
|
|
||||||
from .subtitle import SubtitleTrack
|
|
||||||
|
|
||||||
|
|
||||||
class MediaWithTracks:
|
|
||||||
"""
|
|
||||||
Mixin providing audio/subtitle helpers for entities with track collections.
|
|
||||||
|
|
||||||
Hosts must expose two attributes:
|
|
||||||
|
|
||||||
* ``audio_tracks: list[AudioTrack]``
|
|
||||||
* ``subtitle_tracks: list[SubtitleTrack]``
|
|
||||||
|
|
||||||
The helpers follow the "C+" matching contract: pass a :class:`Language`
|
|
||||||
for cross-format matching, or a ``str`` for case-insensitive comparison.
|
|
||||||
"""
|
|
||||||
|
|
||||||
# These attributes are provided by the host entity (Movie, Episode, …).
|
|
||||||
# Declared here only for type-checkers and to make the contract explicit.
|
|
||||||
audio_tracks: list["AudioTrack"]
|
|
||||||
subtitle_tracks: list["SubtitleTrack"]
|
|
||||||
|
|
||||||
# ── Audio helpers ──────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def has_audio_in(self, lang: str | Language) -> bool:
|
|
||||||
"""True if at least one audio track is in the given language."""
|
|
||||||
return any(track_lang_matches(t.language, lang) for t in self.audio_tracks)
|
|
||||||
|
|
||||||
def audio_languages(self) -> list[str]:
|
|
||||||
"""Unique audio languages across all tracks, in track order."""
|
|
||||||
seen: set[str] = set()
|
|
||||||
result: list[str] = []
|
|
||||||
for t in self.audio_tracks:
|
|
||||||
if t.language and t.language not in seen:
|
|
||||||
seen.add(t.language)
|
|
||||||
result.append(t.language)
|
|
||||||
return result
|
|
||||||
|
|
||||||
# ── Subtitle helpers ───────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def has_subtitles_in(self, lang: str | Language) -> bool:
|
|
||||||
"""True if at least one subtitle track is in the given language."""
|
|
||||||
return any(track_lang_matches(t.language, lang) for t in self.subtitle_tracks)
|
|
||||||
|
|
||||||
def has_forced_subs(self) -> bool:
|
|
||||||
"""True if at least one subtitle track is flagged as forced."""
|
|
||||||
return any(t.is_forced for t in self.subtitle_tracks)
|
|
||||||
|
|
||||||
def subtitle_languages(self) -> list[str]:
|
|
||||||
"""Unique subtitle languages across all tracks, in track order."""
|
|
||||||
seen: set[str] = set()
|
|
||||||
result: list[str] = []
|
|
||||||
for t in self.subtitle_tracks:
|
|
||||||
if t.language and t.language not in seen:
|
|
||||||
seen.add(t.language)
|
|
||||||
result.append(t.language)
|
|
||||||
return result
|
|
||||||
@@ -1,62 +0,0 @@
|
|||||||
"""VideoTrack — a single video stream as reported by ffprobe."""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from dataclasses import dataclass
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
|
||||||
class VideoTrack:
|
|
||||||
"""A single video track as reported by ffprobe.
|
|
||||||
|
|
||||||
A media file typically has one video track but can have several (alt
|
|
||||||
camera angles, attached thumbnail images reported as still-image streams,
|
|
||||||
etc.), hence the list[VideoTrack] on MediaInfo.
|
|
||||||
"""
|
|
||||||
|
|
||||||
index: int
|
|
||||||
codec: str | None # h264, hevc, av1, …
|
|
||||||
width: int | None
|
|
||||||
height: int | None
|
|
||||||
is_default: bool = False
|
|
||||||
|
|
||||||
@property
|
|
||||||
def resolution(self) -> str | None:
|
|
||||||
"""
|
|
||||||
Best-effort resolution string: 2160p, 1080p, 720p, …
|
|
||||||
|
|
||||||
Width takes priority over height to handle widescreen/cinema crops
|
|
||||||
(e.g. 1920×960 scope → 1080p, not 720p). Falls back to height when
|
|
||||||
width is unavailable.
|
|
||||||
"""
|
|
||||||
match (self.width, self.height):
|
|
||||||
case (None, None):
|
|
||||||
return None
|
|
||||||
case (w, h) if w is not None:
|
|
||||||
match True:
|
|
||||||
case _ if w >= 3840:
|
|
||||||
return "2160p"
|
|
||||||
case _ if w >= 1920:
|
|
||||||
return "1080p"
|
|
||||||
case _ if w >= 1280:
|
|
||||||
return "720p"
|
|
||||||
case _ if w >= 720:
|
|
||||||
return "576p"
|
|
||||||
case _ if w >= 640:
|
|
||||||
return "480p"
|
|
||||||
case _:
|
|
||||||
return f"{h}p" if h else f"{w}w"
|
|
||||||
case (None, h):
|
|
||||||
match True:
|
|
||||||
case _ if h >= 2160:
|
|
||||||
return "2160p"
|
|
||||||
case _ if h >= 1080:
|
|
||||||
return "1080p"
|
|
||||||
case _ if h >= 720:
|
|
||||||
return "720p"
|
|
||||||
case _ if h >= 576:
|
|
||||||
return "576p"
|
|
||||||
case _ if h >= 480:
|
|
||||||
return "480p"
|
|
||||||
case _:
|
|
||||||
return f"{h}p"
|
|
||||||
@@ -7,11 +7,13 @@ Protocol without going through real I/O.
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
from .filesystem_scanner import FileEntry, FilesystemScanner
|
from .filesystem_scanner import FileEntry, FilesystemScanner
|
||||||
|
from .language_repository import LanguageRepository
|
||||||
from .media_prober import MediaProber, SubtitleStreamInfo
|
from .media_prober import MediaProber, SubtitleStreamInfo
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
"FileEntry",
|
"FileEntry",
|
||||||
"FilesystemScanner",
|
"FilesystemScanner",
|
||||||
|
"LanguageRepository",
|
||||||
"MediaProber",
|
"MediaProber",
|
||||||
"SubtitleStreamInfo",
|
"SubtitleStreamInfo",
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -0,0 +1,36 @@
|
|||||||
|
"""LanguageRepository port — abstracts canonical language lookup.
|
||||||
|
|
||||||
|
The adapter (typically loading from ISO 639 YAML knowledge) maps a wide
|
||||||
|
range of raw forms (codes, English/native names, aliases) onto the
|
||||||
|
canonical :class:`Language` value object. Domain code accepts the port
|
||||||
|
via constructor injection; tests can pass a small in-memory fake.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Protocol
|
||||||
|
|
||||||
|
from alfred.domain.shared.value_objects import Language
|
||||||
|
|
||||||
|
|
||||||
|
class LanguageRepository(Protocol):
|
||||||
|
"""Canonical language lookup."""
|
||||||
|
|
||||||
|
def from_iso(self, code: str) -> Language | None:
|
||||||
|
"""Look up by canonical ISO 639-2/B code (case-insensitive)."""
|
||||||
|
...
|
||||||
|
|
||||||
|
def from_any(self, raw: str) -> Language | None:
|
||||||
|
"""Look up by any known representation: ISO code, name, alias.
|
||||||
|
|
||||||
|
Case-insensitive. Returns ``None`` when the raw form is unknown.
|
||||||
|
"""
|
||||||
|
...
|
||||||
|
|
||||||
|
def all(self) -> list[Language]:
|
||||||
|
"""Return all known languages, in a stable order."""
|
||||||
|
...
|
||||||
|
|
||||||
|
def __contains__(self, raw: str) -> bool: ...
|
||||||
|
|
||||||
|
def __len__(self) -> int: ...
|
||||||
@@ -9,7 +9,10 @@ from __future__ import annotations
|
|||||||
|
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Protocol
|
from typing import TYPE_CHECKING, Protocol
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from alfred.domain.shared.media import MediaInfo
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
@@ -37,3 +40,13 @@ class MediaProber(Protocol):
|
|||||||
no subtitle streams. Adapters must not raise.
|
no subtitle streams. Adapters must not raise.
|
||||||
"""
|
"""
|
||||||
...
|
...
|
||||||
|
|
||||||
|
def probe(self, video: Path) -> MediaInfo | None:
|
||||||
|
"""Return the full :class:`MediaInfo` for ``video``, or ``None``.
|
||||||
|
|
||||||
|
Covers all stream families (video, audio, subtitle) plus
|
||||||
|
file-level duration / bitrate. ``None`` signals that ffprobe is
|
||||||
|
unavailable or the file can't be read — adapters must not
|
||||||
|
raise.
|
||||||
|
"""
|
||||||
|
...
|
||||||
|
|||||||
@@ -1,5 +1,7 @@
|
|||||||
"""Shared value objects used across multiple domains."""
|
"""Shared value objects used across multiple domains."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
import re
|
import re
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
@@ -43,29 +45,21 @@ class ImdbId:
|
|||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
class FilePath:
|
class FilePath:
|
||||||
"""
|
"""
|
||||||
Value object representing a file path with validation.
|
Value object representing a file path.
|
||||||
|
|
||||||
Ensures the path is valid and optionally checks existence.
|
Accepts either ``str`` or :class:`pathlib.Path` at construction;
|
||||||
|
the value is normalized to ``Path`` in ``__post_init__``.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
value: Path
|
value: Path
|
||||||
|
|
||||||
def __init__(self, path: str | Path):
|
def __post_init__(self) -> None:
|
||||||
"""
|
if isinstance(self.value, Path):
|
||||||
Initialize FilePath.
|
return
|
||||||
|
if isinstance(self.value, str):
|
||||||
Args:
|
object.__setattr__(self, "value", Path(self.value))
|
||||||
path: String or Path object representing the file path
|
return
|
||||||
"""
|
raise ValidationError(f"Path must be str or Path, got {type(self.value)}")
|
||||||
if isinstance(path, str):
|
|
||||||
path_obj = Path(path)
|
|
||||||
elif isinstance(path, Path):
|
|
||||||
path_obj = path
|
|
||||||
else:
|
|
||||||
raise ValidationError(f"Path must be str or Path, got {type(path)}")
|
|
||||||
|
|
||||||
# Use object.__setattr__ because dataclass is frozen
|
|
||||||
object.__setattr__(self, "value", path_obj)
|
|
||||||
|
|
||||||
def __str__(self) -> str:
|
def __str__(self) -> str:
|
||||||
return str(self.value)
|
return str(self.value)
|
||||||
@@ -150,19 +144,49 @@ class Language:
|
|||||||
raise ValidationError(
|
raise ValidationError(
|
||||||
f"Language.iso must be a 3-letter ISO 639-2/B code, got {self.iso!r}"
|
f"Language.iso must be a 3-letter ISO 639-2/B code, got {self.iso!r}"
|
||||||
)
|
)
|
||||||
# Normalize iso to lowercase
|
if self.iso != self.iso.lower():
|
||||||
object.__setattr__(self, "iso", self.iso.lower())
|
raise ValidationError(
|
||||||
# Normalize aliases to a tuple of lowercase strings (dedup, preserve order)
|
f"Language.iso must be lowercase, got {self.iso!r} — "
|
||||||
|
f"use Language.from_raw() to construct from arbitrary input"
|
||||||
|
)
|
||||||
|
for alias in self.aliases:
|
||||||
|
if not isinstance(alias, str) or alias != alias.lower().strip() or not alias:
|
||||||
|
raise ValidationError(
|
||||||
|
f"Language.aliases must be lowercase non-empty strings, "
|
||||||
|
f"got {alias!r} — use Language.from_raw() to normalize"
|
||||||
|
)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_raw(
|
||||||
|
cls,
|
||||||
|
iso: str,
|
||||||
|
english_name: str,
|
||||||
|
native_name: str,
|
||||||
|
aliases: tuple[str, ...] | list[str] = (),
|
||||||
|
) -> Language:
|
||||||
|
"""
|
||||||
|
Construct a Language from arbitrary (possibly un-normalized) input.
|
||||||
|
|
||||||
|
Use this factory when loading from external sources (YAML, user input,
|
||||||
|
third-party APIs) — it lowercases the iso code and normalizes/dedups
|
||||||
|
the alias tuple. The direct constructor is strict and rejects
|
||||||
|
un-normalized input.
|
||||||
|
"""
|
||||||
seen: set[str] = set()
|
seen: set[str] = set()
|
||||||
normalized: list[str] = []
|
normalized: list[str] = []
|
||||||
for alias in self.aliases:
|
for alias in aliases:
|
||||||
if not isinstance(alias, str):
|
if not isinstance(alias, str):
|
||||||
continue
|
continue
|
||||||
a = alias.lower().strip()
|
a = alias.lower().strip()
|
||||||
if a and a not in seen:
|
if a and a not in seen:
|
||||||
seen.add(a)
|
seen.add(a)
|
||||||
normalized.append(a)
|
normalized.append(a)
|
||||||
object.__setattr__(self, "aliases", tuple(normalized))
|
return cls(
|
||||||
|
iso=iso.lower(),
|
||||||
|
english_name=english_name,
|
||||||
|
native_name=native_name,
|
||||||
|
aliases=tuple(normalized),
|
||||||
|
)
|
||||||
|
|
||||||
def matches(self, raw: str) -> bool:
|
def matches(self, raw: str) -> bool:
|
||||||
"""
|
"""
|
||||||
|
|||||||
@@ -1,11 +1,12 @@
|
|||||||
"""Subtitles domain — subtitle identification, classification and placement."""
|
"""Subtitles domain — subtitle identification, classification and placement."""
|
||||||
|
|
||||||
from .aggregates import SubtitleRuleSet
|
from .aggregates import SubtitleRuleSet
|
||||||
from .entities import MediaSubtitleMetadata, SubtitleCandidate
|
from .entities import MediaSubtitleMetadata, SubtitleScanResult
|
||||||
from .exceptions import SubtitleNotFound
|
from .exceptions import SubtitleNotFound
|
||||||
from .services import PatternDetector, SubtitleIdentifier, SubtitleMatcher
|
from .services import PatternDetector, SubtitleIdentifier, SubtitleMatcher
|
||||||
from .value_objects import (
|
from .value_objects import (
|
||||||
RuleScope,
|
RuleScope,
|
||||||
|
RuleScopeLevel,
|
||||||
ScanStrategy,
|
ScanStrategy,
|
||||||
SubtitleFormat,
|
SubtitleFormat,
|
||||||
SubtitleLanguage,
|
SubtitleLanguage,
|
||||||
@@ -16,7 +17,7 @@ from .value_objects import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
"SubtitleCandidate",
|
"SubtitleScanResult",
|
||||||
"MediaSubtitleMetadata",
|
"MediaSubtitleMetadata",
|
||||||
"SubtitleRuleSet",
|
"SubtitleRuleSet",
|
||||||
"SubtitleIdentifier",
|
"SubtitleIdentifier",
|
||||||
@@ -30,5 +31,6 @@ __all__ = [
|
|||||||
"TypeDetectionMethod",
|
"TypeDetectionMethod",
|
||||||
"SubtitleMatchingRules",
|
"SubtitleMatchingRules",
|
||||||
"RuleScope",
|
"RuleScope",
|
||||||
|
"RuleScopeLevel",
|
||||||
"SubtitleNotFound",
|
"SubtitleNotFound",
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ from dataclasses import dataclass, field
|
|||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from ..shared.value_objects import ImdbId
|
from ..shared.value_objects import ImdbId
|
||||||
from .value_objects import RuleScope, SubtitleMatchingRules
|
from .value_objects import RuleScope, RuleScopeLevel, SubtitleMatchingRules
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
@@ -86,10 +86,13 @@ class SubtitleRuleSet:
|
|||||||
if self._min_confidence is not None:
|
if self._min_confidence is not None:
|
||||||
delta["min_confidence"] = self._min_confidence
|
delta["min_confidence"] = self._min_confidence
|
||||||
return {
|
return {
|
||||||
"scope": {"level": self.scope.level, "identifier": self.scope.identifier},
|
"scope": {
|
||||||
|
"level": self.scope.level.value,
|
||||||
|
"identifier": self.scope.identifier,
|
||||||
|
},
|
||||||
"override": delta,
|
"override": delta,
|
||||||
}
|
}
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def global_default(cls) -> SubtitleRuleSet:
|
def global_default(cls) -> SubtitleRuleSet:
|
||||||
return cls(scope=RuleScope(level="global"))
|
return cls(scope=RuleScope(level=RuleScopeLevel.GLOBAL))
|
||||||
|
|||||||
@@ -12,16 +12,18 @@ from .value_objects import (
|
|||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class SubtitleCandidate:
|
class SubtitleScanResult:
|
||||||
"""
|
"""
|
||||||
A subtitle being scanned and matched — either an external file or an embedded stream.
|
A subtitle observed during a scan — either an external file or an embedded stream.
|
||||||
|
|
||||||
Unlike ``alfred.domain.shared.media.SubtitleTrack`` (the pure container-view
|
Unlike ``alfred.domain.shared.media.SubtitleTrack`` (the pure container-view
|
||||||
populated from ffprobe), a SubtitleCandidate carries the **flow state** of the
|
populated from ffprobe), a ``SubtitleScanResult`` carries the **flow state**
|
||||||
subtitle matching pipeline: language/format are typed value objects that may
|
of the subtitle matching pipeline: language/format are typed value objects
|
||||||
be ``None`` while classification is in progress, ``confidence`` reflects how
|
that may be ``None`` while classification is in progress, ``confidence``
|
||||||
certain we are, and ``raw_tokens`` holds the filename fragments still under
|
reflects how certain we are, and ``raw_tokens`` holds the filename fragments
|
||||||
analysis. State evolves: unknown → resolved after user clarification.
|
still under analysis. State evolves: unknown → resolved after user
|
||||||
|
clarification. The name reflects this — it's the **output of a scan pass**,
|
||||||
|
not a value object.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
# Classification (may be None if not yet resolved)
|
# Classification (may be None if not yet resolved)
|
||||||
@@ -72,7 +74,7 @@ class SubtitleCandidate:
|
|||||||
if self.is_embedded
|
if self.is_embedded
|
||||||
else str(self.file_path.name if self.file_path else "?")
|
else str(self.file_path.name if self.file_path else "?")
|
||||||
)
|
)
|
||||||
return f"SubtitleCandidate({lang}, {self.subtitle_type.value}, {fmt}, src={src}, conf={self.confidence:.2f})"
|
return f"SubtitleScanResult({lang}, {self.subtitle_type.value}, {fmt}, src={src}, conf={self.confidence:.2f})"
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
@@ -84,14 +86,14 @@ class MediaSubtitleMetadata:
|
|||||||
|
|
||||||
media_id: ImdbId | None
|
media_id: ImdbId | None
|
||||||
media_type: str # "movie" | "tv_show"
|
media_type: str # "movie" | "tv_show"
|
||||||
embedded_tracks: list[SubtitleCandidate] = field(default_factory=list)
|
embedded_tracks: list[SubtitleScanResult] = field(default_factory=list)
|
||||||
external_tracks: list[SubtitleCandidate] = field(default_factory=list)
|
external_tracks: list[SubtitleScanResult] = field(default_factory=list)
|
||||||
release_group: str | None = None
|
release_group: str | None = None
|
||||||
detected_pattern_id: str | None = None # pattern id from knowledge base
|
detected_pattern_id: str | None = None # pattern id from knowledge base
|
||||||
pattern_confirmed: bool = False
|
pattern_confirmed: bool = False
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def all_tracks(self) -> list[SubtitleCandidate]:
|
def all_tracks(self) -> list[SubtitleScanResult]:
|
||||||
return self.embedded_tracks + self.external_tracks
|
return self.embedded_tracks + self.external_tracks
|
||||||
|
|
||||||
@property
|
@property
|
||||||
@@ -99,5 +101,5 @@ class MediaSubtitleMetadata:
|
|||||||
return len(self.embedded_tracks) + len(self.external_tracks)
|
return len(self.embedded_tracks) + len(self.external_tracks)
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def unresolved_tracks(self) -> list[SubtitleCandidate]:
|
def unresolved_tracks(self) -> list[SubtitleScanResult]:
|
||||||
return [t for t in self.external_tracks if t.language is None]
|
return [t for t in self.external_tracks if t.language is None]
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ from pathlib import Path
|
|||||||
from ...shared.ports import FilesystemScanner, MediaProber
|
from ...shared.ports import FilesystemScanner, MediaProber
|
||||||
from ..ports import SubtitleKnowledge
|
from ..ports import SubtitleKnowledge
|
||||||
from ...shared.value_objects import ImdbId
|
from ...shared.value_objects import ImdbId
|
||||||
from ..entities import MediaSubtitleMetadata, SubtitleCandidate
|
from ..entities import MediaSubtitleMetadata, SubtitleScanResult
|
||||||
from ..value_objects import ScanStrategy, SubtitlePattern, SubtitleType
|
from ..value_objects import ScanStrategy, SubtitlePattern, SubtitleType
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
@@ -94,7 +94,7 @@ class SubtitleIdentifier:
|
|||||||
# Embedded tracks — via MediaProber
|
# Embedded tracks — via MediaProber
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
def _scan_embedded(self, video_path: Path) -> list[SubtitleCandidate]:
|
def _scan_embedded(self, video_path: Path) -> list[SubtitleScanResult]:
|
||||||
streams = self.prober.list_subtitle_streams(video_path)
|
streams = self.prober.list_subtitle_streams(video_path)
|
||||||
|
|
||||||
tracks = []
|
tracks = []
|
||||||
@@ -111,7 +111,7 @@ class SubtitleIdentifier:
|
|||||||
stype = SubtitleType.STANDARD
|
stype = SubtitleType.STANDARD
|
||||||
|
|
||||||
tracks.append(
|
tracks.append(
|
||||||
SubtitleCandidate(
|
SubtitleScanResult(
|
||||||
language=lang,
|
language=lang,
|
||||||
format=None,
|
format=None,
|
||||||
subtitle_type=stype,
|
subtitle_type=stype,
|
||||||
@@ -131,7 +131,7 @@ class SubtitleIdentifier:
|
|||||||
|
|
||||||
def _scan_external(
|
def _scan_external(
|
||||||
self, video_path: Path, pattern: SubtitlePattern
|
self, video_path: Path, pattern: SubtitlePattern
|
||||||
) -> list[SubtitleCandidate]:
|
) -> list[SubtitleScanResult]:
|
||||||
strategy = pattern.scan_strategy
|
strategy = pattern.scan_strategy
|
||||||
episode_stem: str | None = None
|
episode_stem: str | None = None
|
||||||
|
|
||||||
@@ -200,7 +200,7 @@ class SubtitleIdentifier:
|
|||||||
entries: list,
|
entries: list,
|
||||||
pattern: SubtitlePattern,
|
pattern: SubtitlePattern,
|
||||||
episode_stem: str | None = None,
|
episode_stem: str | None = None,
|
||||||
) -> list[SubtitleCandidate]:
|
) -> list[SubtitleScanResult]:
|
||||||
tracks = [
|
tracks = [
|
||||||
self._classify_single(entry, episode_stem=episode_stem) for entry in entries
|
self._classify_single(entry, episode_stem=episode_stem) for entry in entries
|
||||||
]
|
]
|
||||||
@@ -214,7 +214,7 @@ class SubtitleIdentifier:
|
|||||||
|
|
||||||
def _classify_single(
|
def _classify_single(
|
||||||
self, entry, episode_stem: str | None = None
|
self, entry, episode_stem: str | None = None
|
||||||
) -> SubtitleCandidate:
|
) -> SubtitleScanResult:
|
||||||
fmt = self.kb.format_for_extension(entry.suffix)
|
fmt = self.kb.format_for_extension(entry.suffix)
|
||||||
tokens = (
|
tokens = (
|
||||||
_tokenize_suffix(entry.stem, episode_stem)
|
_tokenize_suffix(entry.stem, episode_stem)
|
||||||
@@ -253,7 +253,7 @@ class SubtitleIdentifier:
|
|||||||
if entry.suffix.lower() == ".srt":
|
if entry.suffix.lower() == ".srt":
|
||||||
entry_count = _count_entries(self.scanner.read_text(entry.path))
|
entry_count = _count_entries(self.scanner.read_text(entry.path))
|
||||||
|
|
||||||
return SubtitleCandidate(
|
return SubtitleScanResult(
|
||||||
language=language,
|
language=language,
|
||||||
format=fmt,
|
format=fmt,
|
||||||
subtitle_type=subtitle_type,
|
subtitle_type=subtitle_type,
|
||||||
@@ -266,8 +266,8 @@ class SubtitleIdentifier:
|
|||||||
)
|
)
|
||||||
|
|
||||||
def _disambiguate_by_size(
|
def _disambiguate_by_size(
|
||||||
self, tracks: list[SubtitleCandidate]
|
self, tracks: list[SubtitleScanResult]
|
||||||
) -> list[SubtitleCandidate]:
|
) -> list[SubtitleScanResult]:
|
||||||
"""
|
"""
|
||||||
When multiple tracks share the same language and type is UNKNOWN/STANDARD,
|
When multiple tracks share the same language and type is UNKNOWN/STANDARD,
|
||||||
the one with the most entries (lines) is SDH, the smallest is FORCED if
|
the one with the most entries (lines) is SDH, the smallest is FORCED if
|
||||||
@@ -277,7 +277,7 @@ class SubtitleIdentifier:
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
# Group by language code
|
# Group by language code
|
||||||
lang_groups: dict[str, list[SubtitleCandidate]] = {}
|
lang_groups: dict[str, list[SubtitleScanResult]] = {}
|
||||||
for track in tracks:
|
for track in tracks:
|
||||||
key = track.language.code if track.language else "__unknown__"
|
key = track.language.code if track.language else "__unknown__"
|
||||||
lang_groups.setdefault(key, []).append(track)
|
lang_groups.setdefault(key, []).append(track)
|
||||||
@@ -306,6 +306,6 @@ class SubtitleIdentifier:
|
|||||||
|
|
||||||
return result
|
return result
|
||||||
|
|
||||||
def _set_type(self, track: SubtitleCandidate, stype: SubtitleType) -> None:
|
def _set_type(self, track: SubtitleScanResult, stype: SubtitleType) -> None:
|
||||||
"""Mutate track type in-place."""
|
"""Mutate track type in-place."""
|
||||||
track.subtitle_type = stype
|
track.subtitle_type = stype
|
||||||
|
|||||||
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
import logging
|
import logging
|
||||||
|
|
||||||
from ..entities import SubtitleCandidate
|
from ..entities import SubtitleScanResult
|
||||||
from ..value_objects import SubtitleMatchingRules
|
from ..value_objects import SubtitleMatchingRules
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
@@ -10,7 +10,7 @@ logger = logging.getLogger(__name__)
|
|||||||
|
|
||||||
class SubtitleMatcher:
|
class SubtitleMatcher:
|
||||||
"""
|
"""
|
||||||
Filters a list of SubtitleCandidate against effective SubtitleMatchingRules.
|
Filters a list of SubtitleScanResult against effective SubtitleMatchingRules.
|
||||||
|
|
||||||
Returns matched tracks (pass all filters, confidence >= min_confidence)
|
Returns matched tracks (pass all filters, confidence >= min_confidence)
|
||||||
and unresolved tracks (need user clarification).
|
and unresolved tracks (need user clarification).
|
||||||
@@ -21,14 +21,14 @@ class SubtitleMatcher:
|
|||||||
|
|
||||||
def match(
|
def match(
|
||||||
self,
|
self,
|
||||||
tracks: list[SubtitleCandidate],
|
tracks: list[SubtitleScanResult],
|
||||||
rules: SubtitleMatchingRules,
|
rules: SubtitleMatchingRules,
|
||||||
) -> tuple[list[SubtitleCandidate], list[SubtitleCandidate]]:
|
) -> tuple[list[SubtitleScanResult], list[SubtitleScanResult]]:
|
||||||
"""
|
"""
|
||||||
Returns (matched, unresolved).
|
Returns (matched, unresolved).
|
||||||
"""
|
"""
|
||||||
matched: list[SubtitleCandidate] = []
|
matched: list[SubtitleScanResult] = []
|
||||||
unresolved: list[SubtitleCandidate] = []
|
unresolved: list[SubtitleScanResult] = []
|
||||||
|
|
||||||
for track in tracks:
|
for track in tracks:
|
||||||
if track.is_embedded:
|
if track.is_embedded:
|
||||||
@@ -51,7 +51,7 @@ class SubtitleMatcher:
|
|||||||
return matched, unresolved
|
return matched, unresolved
|
||||||
|
|
||||||
def _passes_filters(
|
def _passes_filters(
|
||||||
self, track: SubtitleCandidate, rules: SubtitleMatchingRules
|
self, track: SubtitleScanResult, rules: SubtitleMatchingRules
|
||||||
) -> bool:
|
) -> bool:
|
||||||
# Language filter
|
# Language filter
|
||||||
if rules.preferred_languages:
|
if rules.preferred_languages:
|
||||||
@@ -76,14 +76,14 @@ class SubtitleMatcher:
|
|||||||
|
|
||||||
def _resolve_conflicts(
|
def _resolve_conflicts(
|
||||||
self,
|
self,
|
||||||
tracks: list[SubtitleCandidate],
|
tracks: list[SubtitleScanResult],
|
||||||
rules: SubtitleMatchingRules,
|
rules: SubtitleMatchingRules,
|
||||||
) -> list[SubtitleCandidate]:
|
) -> list[SubtitleScanResult]:
|
||||||
"""
|
"""
|
||||||
When multiple tracks have same language + type, keep only the best one
|
When multiple tracks have same language + type, keep only the best one
|
||||||
according to format_priority. If no format_priority applies, keep the first.
|
according to format_priority. If no format_priority applies, keep the first.
|
||||||
"""
|
"""
|
||||||
seen: dict[tuple, SubtitleCandidate] = {}
|
seen: dict[tuple, SubtitleScanResult] = {}
|
||||||
|
|
||||||
for track in tracks:
|
for track in tracks:
|
||||||
lang = track.language.code if track.language else None
|
lang = track.language.code if track.language else None
|
||||||
@@ -106,8 +106,8 @@ class SubtitleMatcher:
|
|||||||
|
|
||||||
def _prefer(
|
def _prefer(
|
||||||
self,
|
self,
|
||||||
candidate: SubtitleCandidate,
|
candidate: SubtitleScanResult,
|
||||||
existing: SubtitleCandidate,
|
existing: SubtitleScanResult,
|
||||||
format_priority: list[str],
|
format_priority: list[str],
|
||||||
) -> bool:
|
) -> bool:
|
||||||
"""Return True if candidate is preferable to existing."""
|
"""Return True if candidate is preferable to existing."""
|
||||||
|
|||||||
@@ -1,9 +1,9 @@
|
|||||||
"""Subtitle service utilities."""
|
"""Subtitle service utilities."""
|
||||||
|
|
||||||
from ..entities import SubtitleCandidate
|
from ..entities import SubtitleScanResult
|
||||||
|
|
||||||
|
|
||||||
def available_subtitles(tracks: list[SubtitleCandidate]) -> list[SubtitleCandidate]:
|
def available_subtitles(tracks: list[SubtitleScanResult]) -> list[SubtitleScanResult]:
|
||||||
"""
|
"""
|
||||||
Return the distinct subtitle tracks available, deduped by (language, type).
|
Return the distinct subtitle tracks available, deduped by (language, type).
|
||||||
|
|
||||||
@@ -11,7 +11,7 @@ def available_subtitles(tracks: list[SubtitleCandidate]) -> list[SubtitleCandida
|
|||||||
preferences — e.g. eng, eng.sdh, fra all show up as separate entries.
|
preferences — e.g. eng, eng.sdh, fra all show up as separate entries.
|
||||||
"""
|
"""
|
||||||
seen: set[tuple] = set()
|
seen: set[tuple] = set()
|
||||||
result: list[SubtitleCandidate] = []
|
result: list[SubtitleScanResult] = []
|
||||||
for track in tracks:
|
for track in tracks:
|
||||||
lang = track.language.code if track.language else None
|
lang = track.language.code if track.language else None
|
||||||
key = (lang, track.subtitle_type)
|
key = (lang, track.subtitle_type)
|
||||||
|
|||||||
@@ -83,9 +83,20 @@ class SubtitleMatchingRules:
|
|||||||
min_confidence: float = 0.7
|
min_confidence: float = 0.7
|
||||||
|
|
||||||
|
|
||||||
|
class RuleScopeLevel(str, Enum):
|
||||||
|
"""At which level a subtitle rule set applies."""
|
||||||
|
|
||||||
|
GLOBAL = "global"
|
||||||
|
RELEASE_GROUP = "release_group"
|
||||||
|
MOVIE = "movie"
|
||||||
|
SHOW = "show"
|
||||||
|
SEASON = "season"
|
||||||
|
EPISODE = "episode"
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
class RuleScope:
|
class RuleScope:
|
||||||
"""At which level a rule set applies."""
|
"""At which level a rule set applies."""
|
||||||
|
|
||||||
level: str # "global" | "release_group" | "movie" | "show" | "season" | "episode"
|
level: RuleScopeLevel
|
||||||
identifier: str | None = None # imdb_id, group name, "S01", "S01E03"…
|
identifier: str | None = None # imdb_id, group name, "S01", "S01E03"…
|
||||||
|
|||||||
@@ -47,16 +47,19 @@ from .value_objects import (
|
|||||||
# ════════════════════════════════════════════════════════════════════════════
|
# ════════════════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
|
||||||
@dataclass(eq=False)
|
@dataclass(frozen=True, eq=False)
|
||||||
class Episode(MediaWithTracks):
|
class Episode(MediaWithTracks):
|
||||||
"""
|
"""
|
||||||
A single episode of a TV show — leaf of the TVShow aggregate.
|
A single episode of a TV show — leaf of the TVShow aggregate.
|
||||||
|
|
||||||
Carries the file metadata (path, size) and the discovered tracks
|
Carries the file metadata (path, size) and the discovered tracks
|
||||||
(audio + subtitle). Track lists are populated by the ffprobe + subtitle
|
(audio + subtitle). Track tuples are populated by the ffprobe + subtitle
|
||||||
scan pipeline; they may be empty when the episode is known but not yet
|
scan pipeline; they may be empty when the episode is known but not yet
|
||||||
scanned, or when no file is downloaded yet.
|
scanned, or when no file is downloaded yet.
|
||||||
|
|
||||||
|
Frozen: rebuild via ``dataclasses.replace`` to project enrichment results
|
||||||
|
onto a new instance.
|
||||||
|
|
||||||
Equality is identity-based within the aggregate: two ``Episode`` instances
|
Equality is identity-based within the aggregate: two ``Episode`` instances
|
||||||
are equal iff they share the same ``(season_number, episode_number)``,
|
are equal iff they share the same ``(season_number, episode_number)``,
|
||||||
regardless of title/file/track contents. The root TVShow guarantees
|
regardless of title/file/track contents. The root TVShow guarantees
|
||||||
@@ -68,17 +71,21 @@ class Episode(MediaWithTracks):
|
|||||||
title: str
|
title: str
|
||||||
file_path: FilePath | None = None
|
file_path: FilePath | None = None
|
||||||
file_size: FileSize | None = None
|
file_size: FileSize | None = None
|
||||||
audio_tracks: list[AudioTrack] = field(default_factory=list)
|
audio_tracks: tuple[AudioTrack, ...] = field(default_factory=tuple)
|
||||||
subtitle_tracks: list[SubtitleTrack] = field(default_factory=list)
|
subtitle_tracks: tuple[SubtitleTrack, ...] = field(default_factory=tuple)
|
||||||
|
|
||||||
def __post_init__(self) -> None:
|
def __post_init__(self) -> None:
|
||||||
# Coerce numbers if raw ints were passed
|
# Coerce numbers if raw ints were passed
|
||||||
if not isinstance(self.season_number, SeasonNumber):
|
if not isinstance(self.season_number, SeasonNumber):
|
||||||
if isinstance(self.season_number, int):
|
if isinstance(self.season_number, int):
|
||||||
self.season_number = SeasonNumber(self.season_number)
|
object.__setattr__(
|
||||||
|
self, "season_number", SeasonNumber(self.season_number)
|
||||||
|
)
|
||||||
if not isinstance(self.episode_number, EpisodeNumber):
|
if not isinstance(self.episode_number, EpisodeNumber):
|
||||||
if isinstance(self.episode_number, int):
|
if isinstance(self.episode_number, int):
|
||||||
self.episode_number = EpisodeNumber(self.episode_number)
|
object.__setattr__(
|
||||||
|
self, "episode_number", EpisodeNumber(self.episode_number)
|
||||||
|
)
|
||||||
|
|
||||||
def __eq__(self, other: object) -> bool:
|
def __eq__(self, other: object) -> bool:
|
||||||
if not isinstance(other, Episode):
|
if not isinstance(other, Episode):
|
||||||
|
|||||||
@@ -1,121 +0,0 @@
|
|||||||
"""ffprobe — infrastructure adapter for extracting MediaInfo from a video file."""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
import logging
|
|
||||||
import subprocess
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
from alfred.domain.shared.media import AudioTrack, MediaInfo, SubtitleTrack, VideoTrack
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
_FFPROBE_CMD = [
|
|
||||||
"ffprobe",
|
|
||||||
"-v",
|
|
||||||
"quiet",
|
|
||||||
"-print_format",
|
|
||||||
"json",
|
|
||||||
"-show_streams",
|
|
||||||
"-show_format",
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
def probe(path: Path) -> MediaInfo | None:
|
|
||||||
"""
|
|
||||||
Run ffprobe on path and return a MediaInfo.
|
|
||||||
|
|
||||||
Returns None if ffprobe is not available or the file cannot be probed.
|
|
||||||
"""
|
|
||||||
try:
|
|
||||||
result = subprocess.run(
|
|
||||||
[*_FFPROBE_CMD, str(path)],
|
|
||||||
capture_output=True,
|
|
||||||
text=True,
|
|
||||||
timeout=30,
|
|
||||||
check=False,
|
|
||||||
)
|
|
||||||
except subprocess.TimeoutExpired:
|
|
||||||
logger.warning("ffprobe timed out on %s", path)
|
|
||||||
return None
|
|
||||||
|
|
||||||
if result.returncode != 0:
|
|
||||||
logger.warning("ffprobe failed on %s: %s", path, result.stderr.strip())
|
|
||||||
return None
|
|
||||||
|
|
||||||
try:
|
|
||||||
data = json.loads(result.stdout)
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
logger.warning("ffprobe returned invalid JSON for %s", path)
|
|
||||||
return None
|
|
||||||
|
|
||||||
return _parse(data)
|
|
||||||
|
|
||||||
|
|
||||||
def _parse(data: dict) -> MediaInfo:
|
|
||||||
streams = data.get("streams", [])
|
|
||||||
fmt = data.get("format", {})
|
|
||||||
|
|
||||||
# File-level duration/bitrate (ffprobe ``format`` block — independent of streams)
|
|
||||||
duration_seconds: float | None = None
|
|
||||||
bitrate_kbps: int | None = None
|
|
||||||
if "duration" in fmt:
|
|
||||||
try:
|
|
||||||
duration_seconds = float(fmt["duration"])
|
|
||||||
except ValueError:
|
|
||||||
pass
|
|
||||||
if "bit_rate" in fmt:
|
|
||||||
try:
|
|
||||||
bitrate_kbps = int(fmt["bit_rate"]) // 1000
|
|
||||||
except ValueError:
|
|
||||||
pass
|
|
||||||
|
|
||||||
video_tracks: list[VideoTrack] = []
|
|
||||||
audio_tracks: list[AudioTrack] = []
|
|
||||||
subtitle_tracks: list[SubtitleTrack] = []
|
|
||||||
|
|
||||||
for stream in streams:
|
|
||||||
codec_type = stream.get("codec_type")
|
|
||||||
|
|
||||||
if codec_type == "video":
|
|
||||||
video_tracks.append(
|
|
||||||
VideoTrack(
|
|
||||||
index=stream.get("index", len(video_tracks)),
|
|
||||||
codec=stream.get("codec_name"),
|
|
||||||
width=stream.get("width"),
|
|
||||||
height=stream.get("height"),
|
|
||||||
is_default=stream.get("disposition", {}).get("default", 0) == 1,
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
elif codec_type == "audio":
|
|
||||||
audio_tracks.append(
|
|
||||||
AudioTrack(
|
|
||||||
index=stream.get("index", len(audio_tracks)),
|
|
||||||
codec=stream.get("codec_name"),
|
|
||||||
channels=stream.get("channels"),
|
|
||||||
channel_layout=stream.get("channel_layout"),
|
|
||||||
language=stream.get("tags", {}).get("language"),
|
|
||||||
is_default=stream.get("disposition", {}).get("default", 0) == 1,
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
elif codec_type == "subtitle":
|
|
||||||
subtitle_tracks.append(
|
|
||||||
SubtitleTrack(
|
|
||||||
index=stream.get("index", len(subtitle_tracks)),
|
|
||||||
codec=stream.get("codec_name"),
|
|
||||||
language=stream.get("tags", {}).get("language"),
|
|
||||||
is_default=stream.get("disposition", {}).get("default", 0) == 1,
|
|
||||||
is_forced=stream.get("disposition", {}).get("forced", 0) == 1,
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
return MediaInfo(
|
|
||||||
video_tracks=tuple(video_tracks),
|
|
||||||
audio_tracks=tuple(audio_tracks),
|
|
||||||
subtitle_tracks=tuple(subtitle_tracks),
|
|
||||||
duration_seconds=duration_seconds,
|
|
||||||
bitrate_kbps=bitrate_kbps,
|
|
||||||
)
|
|
||||||
@@ -87,7 +87,7 @@ class LanguageRegistry:
|
|||||||
merged = _merge_language_entries(builtin, learned)
|
merged = _merge_language_entries(builtin, learned)
|
||||||
|
|
||||||
for iso, entry in merged.items():
|
for iso, entry in merged.items():
|
||||||
language = Language(
|
language = Language.from_raw(
|
||||||
iso=iso,
|
iso=iso,
|
||||||
english_name=entry.get("english_name", iso),
|
english_name=entry.get("english_name", iso),
|
||||||
native_name=entry.get("native_name", iso),
|
native_name=entry.get("native_name", iso),
|
||||||
|
|||||||
@@ -16,9 +16,11 @@ import alfred as _alfred_pkg
|
|||||||
|
|
||||||
_BUILTIN_ROOT = Path(_alfred_pkg.__file__).parent / "knowledge" / "release"
|
_BUILTIN_ROOT = Path(_alfred_pkg.__file__).parent / "knowledge" / "release"
|
||||||
_SITES_ROOT = _BUILTIN_ROOT / "sites"
|
_SITES_ROOT = _BUILTIN_ROOT / "sites"
|
||||||
|
_GROUPS_ROOT = _BUILTIN_ROOT / "release_groups"
|
||||||
_LEARNED_ROOT = (
|
_LEARNED_ROOT = (
|
||||||
Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge" / "release"
|
Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge" / "release"
|
||||||
)
|
)
|
||||||
|
_LEARNED_GROUPS_ROOT = _LEARNED_ROOT / "release_groups"
|
||||||
|
|
||||||
|
|
||||||
def _merge(base: dict, overlay: dict) -> dict:
|
def _merge(base: dict, overlay: dict) -> dict:
|
||||||
@@ -62,6 +64,15 @@ def load_sources() -> set[str]:
|
|||||||
return set(_load("sources.yaml").get("sources", []))
|
return set(_load("sources.yaml").get("sources", []))
|
||||||
|
|
||||||
|
|
||||||
|
def load_distributors() -> set[str]:
|
||||||
|
"""Streaming distributor tokens (NF, AMZN, DSNP, …).
|
||||||
|
|
||||||
|
Distinct from ``load_sources()`` — distributors are uppercase scene
|
||||||
|
tags identifying the platform, not the capture origin.
|
||||||
|
"""
|
||||||
|
return {t.upper() for t in _load("distributors.yaml").get("distributors", [])}
|
||||||
|
|
||||||
|
|
||||||
def load_codecs() -> set[str]:
|
def load_codecs() -> set[str]:
|
||||||
return set(_load("codecs.yaml").get("codecs", []))
|
return set(_load("codecs.yaml").get("codecs", []))
|
||||||
|
|
||||||
@@ -128,6 +139,88 @@ def load_media_type_tokens() -> dict:
|
|||||||
return _load_sites().get("media_type_tokens", {})
|
return _load_sites().get("media_type_tokens", {})
|
||||||
|
|
||||||
|
|
||||||
|
def load_group_schemas() -> dict:
|
||||||
|
"""Load every release-group schema YAML keyed by uppercase group name.
|
||||||
|
|
||||||
|
Builtin schemas in ``alfred/knowledge/release/release_groups/`` are
|
||||||
|
merged with user-learned schemas in
|
||||||
|
``data/knowledge/release/release_groups/`` (the learned ones win on
|
||||||
|
name collision).
|
||||||
|
"""
|
||||||
|
result: dict = {}
|
||||||
|
for root in (_GROUPS_ROOT, _LEARNED_GROUPS_ROOT):
|
||||||
|
if not root.is_dir():
|
||||||
|
continue
|
||||||
|
for path in sorted(root.glob("*.yaml")):
|
||||||
|
data = _read(path)
|
||||||
|
name = data.get("name")
|
||||||
|
if not name:
|
||||||
|
continue
|
||||||
|
result[name.upper()] = data
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def load_scoring() -> dict:
|
||||||
|
"""Load the parse-scoring config.
|
||||||
|
|
||||||
|
Returns a dict with three top-level keys: ``weights``, ``penalties``,
|
||||||
|
``thresholds``. Defaults are baked in so a missing or partial YAML
|
||||||
|
never breaks the parser — only de-tunes it.
|
||||||
|
"""
|
||||||
|
raw = _load("scoring.yaml")
|
||||||
|
weights = {
|
||||||
|
"title": 30,
|
||||||
|
"media_type": 20,
|
||||||
|
"year": 15,
|
||||||
|
"season": 10,
|
||||||
|
"episode": 5,
|
||||||
|
"resolution": 5,
|
||||||
|
"source": 5,
|
||||||
|
"codec": 5,
|
||||||
|
"group": 5,
|
||||||
|
}
|
||||||
|
weights.update(raw.get("weights", {}) or {})
|
||||||
|
penalties = {"unknown_token": 5, "max_unknown_penalty": 30}
|
||||||
|
penalties.update(raw.get("penalties", {}) or {})
|
||||||
|
thresholds = {"shitty_min": 60}
|
||||||
|
thresholds.update(raw.get("thresholds", {}) or {})
|
||||||
|
return {
|
||||||
|
"weights": weights,
|
||||||
|
"penalties": penalties,
|
||||||
|
"thresholds": thresholds,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def load_probe_mappings() -> dict:
|
||||||
|
"""Load ffprobe→scene-token translation tables.
|
||||||
|
|
||||||
|
Returns a dict with three keys:
|
||||||
|
|
||||||
|
- ``video_codec``: ``{ffprobe_codec_lower: scene_token}``
|
||||||
|
- ``audio_codec``: ``{ffprobe_codec_lower: scene_token}``
|
||||||
|
- ``audio_channels``: ``{channel_count_int: layout_str}``
|
||||||
|
|
||||||
|
Channel-count keys are normalized to ``int`` here so the consumer can
|
||||||
|
look up ``track.channels`` directly. Missing sections fall back to
|
||||||
|
empty dicts — the enrichment code degrades to its uppercase-fallback
|
||||||
|
path when a mapping is absent.
|
||||||
|
"""
|
||||||
|
raw = _load("probe_mappings.yaml")
|
||||||
|
video_codec = {k.lower(): v for k, v in (raw.get("video_codec") or {}).items()}
|
||||||
|
audio_codec = {k.lower(): v for k, v in (raw.get("audio_codec") or {}).items()}
|
||||||
|
audio_channels: dict[int, str] = {}
|
||||||
|
for k, v in (raw.get("audio_channels") or {}).items():
|
||||||
|
try:
|
||||||
|
audio_channels[int(k)] = v
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
continue
|
||||||
|
return {
|
||||||
|
"video_codec": video_codec,
|
||||||
|
"audio_codec": audio_codec,
|
||||||
|
"audio_channels": audio_channels,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
def load_separators() -> list[str]:
|
def load_separators() -> list[str]:
|
||||||
"""Single-char token separators used by the release name tokenizer.
|
"""Single-char token separators used by the release name tokenizer.
|
||||||
|
|
||||||
|
|||||||
@@ -14,17 +14,24 @@ filesystem-level concerns.
|
|||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from alfred.domain.release.parser.schema import GroupSchema, SchemaChunk
|
||||||
|
from alfred.domain.release.parser.tokens import TokenRole
|
||||||
|
|
||||||
from .release import (
|
from .release import (
|
||||||
load_audio,
|
load_audio,
|
||||||
load_codecs,
|
load_codecs,
|
||||||
|
load_distributors,
|
||||||
load_editions,
|
load_editions,
|
||||||
load_forbidden_chars,
|
load_forbidden_chars,
|
||||||
|
load_group_schemas,
|
||||||
load_hdr_extra,
|
load_hdr_extra,
|
||||||
load_language_tokens,
|
load_language_tokens,
|
||||||
load_media_type_tokens,
|
load_media_type_tokens,
|
||||||
load_metadata_extensions,
|
load_metadata_extensions,
|
||||||
load_non_video_extensions,
|
load_non_video_extensions,
|
||||||
|
load_probe_mappings,
|
||||||
load_resolutions,
|
load_resolutions,
|
||||||
|
load_scoring,
|
||||||
load_separators,
|
load_separators,
|
||||||
load_sources,
|
load_sources,
|
||||||
load_sources_extra,
|
load_sources_extra,
|
||||||
@@ -35,6 +42,26 @@ from .release import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _build_group_schema(data: dict) -> GroupSchema:
|
||||||
|
"""Translate a raw YAML schema dict into a frozen :class:`GroupSchema`.
|
||||||
|
|
||||||
|
Unknown roles raise ``ValueError`` early so a typo in a YAML file
|
||||||
|
surfaces at construction time, not on first parse.
|
||||||
|
"""
|
||||||
|
chunks = tuple(
|
||||||
|
SchemaChunk(
|
||||||
|
role=TokenRole(entry["role"]),
|
||||||
|
optional=bool(entry.get("optional", False)),
|
||||||
|
)
|
||||||
|
for entry in data.get("chunk_order", [])
|
||||||
|
)
|
||||||
|
return GroupSchema(
|
||||||
|
name=data["name"],
|
||||||
|
separator=data.get("separator", "."),
|
||||||
|
chunks=chunks,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class YamlReleaseKnowledge:
|
class YamlReleaseKnowledge:
|
||||||
"""Single object holding every parsed-release knowledge constant.
|
"""Single object holding every parsed-release knowledge constant.
|
||||||
|
|
||||||
@@ -48,6 +75,7 @@ class YamlReleaseKnowledge:
|
|||||||
self.resolutions: set[str] = load_resolutions()
|
self.resolutions: set[str] = load_resolutions()
|
||||||
self.sources: set[str] = load_sources() | load_sources_extra()
|
self.sources: set[str] = load_sources() | load_sources_extra()
|
||||||
self.codecs: set[str] = load_codecs()
|
self.codecs: set[str] = load_codecs()
|
||||||
|
self.distributors: set[str] = load_distributors()
|
||||||
self.language_tokens: set[str] = load_language_tokens()
|
self.language_tokens: set[str] = load_language_tokens()
|
||||||
self.forbidden_chars: set[str] = load_forbidden_chars()
|
self.forbidden_chars: set[str] = load_forbidden_chars()
|
||||||
self.hdr_extra: set[str] = load_hdr_extra()
|
self.hdr_extra: set[str] = load_hdr_extra()
|
||||||
@@ -59,6 +87,13 @@ class YamlReleaseKnowledge:
|
|||||||
|
|
||||||
self.separators: list[str] = load_separators()
|
self.separators: list[str] = load_separators()
|
||||||
|
|
||||||
|
# Parse-scoring config (weights / penalties / thresholds).
|
||||||
|
self.scoring: dict = load_scoring()
|
||||||
|
|
||||||
|
# ffprobe → scene-token mapping tables (consumed by
|
||||||
|
# ``application.release.enrich_from_probe``).
|
||||||
|
self.probe_mappings: dict = load_probe_mappings()
|
||||||
|
|
||||||
# File-extension sets (used by application/infra modules, not by
|
# File-extension sets (used by application/infra modules, not by
|
||||||
# the parser itself — kept here so there is a single ownership
|
# the parser itself — kept here so there is a single ownership
|
||||||
# point for release knowledge).
|
# point for release knowledge).
|
||||||
@@ -78,6 +113,15 @@ class YamlReleaseKnowledge:
|
|||||||
"", "", "".join(load_win_forbidden_chars())
|
"", "", "".join(load_win_forbidden_chars())
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# Group schemas, keyed by uppercase group name for fast lookup.
|
||||||
|
self._group_schemas: dict[str, GroupSchema] = {
|
||||||
|
key: _build_group_schema(data)
|
||||||
|
for key, data in load_group_schemas().items()
|
||||||
|
}
|
||||||
|
|
||||||
def sanitize_for_fs(self, text: str) -> str:
|
def sanitize_for_fs(self, text: str) -> str:
|
||||||
"""Strip Windows-forbidden characters from ``text``."""
|
"""Strip Windows-forbidden characters from ``text``."""
|
||||||
return text.translate(self._win_forbidden_table)
|
return text.translate(self._win_forbidden_table)
|
||||||
|
|
||||||
|
def group_schema(self, name: str) -> GroupSchema | None:
|
||||||
|
return self._group_schemas.get(name.upper())
|
||||||
|
|||||||
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
import logging
|
import logging
|
||||||
|
|
||||||
from alfred.infrastructure.knowledge.language_registry import LanguageRegistry
|
from alfred.domain.shared.ports import LanguageRepository
|
||||||
from alfred.domain.subtitles.value_objects import (
|
from alfred.domain.subtitles.value_objects import (
|
||||||
ScanStrategy,
|
ScanStrategy,
|
||||||
SubtitleFormat,
|
SubtitleFormat,
|
||||||
@@ -12,6 +12,8 @@ from alfred.domain.subtitles.value_objects import (
|
|||||||
SubtitleType,
|
SubtitleType,
|
||||||
TypeDetectionMethod,
|
TypeDetectionMethod,
|
||||||
)
|
)
|
||||||
|
from alfred.infrastructure.knowledge.language_registry import LanguageRegistry
|
||||||
|
|
||||||
from .loader import KnowledgeLoader
|
from .loader import KnowledgeLoader
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
@@ -28,10 +30,12 @@ class SubtitleKnowledgeBase:
|
|||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
loader: KnowledgeLoader | None = None,
|
loader: KnowledgeLoader | None = None,
|
||||||
language_registry: LanguageRegistry | None = None,
|
language_registry: LanguageRepository | None = None,
|
||||||
):
|
):
|
||||||
self._loader = loader or KnowledgeLoader()
|
self._loader = loader or KnowledgeLoader()
|
||||||
self._language_registry = language_registry or LanguageRegistry()
|
self._language_registry: LanguageRepository = (
|
||||||
|
language_registry or LanguageRegistry()
|
||||||
|
)
|
||||||
self._build()
|
self._build()
|
||||||
|
|
||||||
def _build(self) -> None: # noqa: PLR0912 — straight-line YAML projection
|
def _build(self) -> None: # noqa: PLR0912 — straight-line YAML projection
|
||||||
|
|||||||
@@ -7,12 +7,23 @@ import logging
|
|||||||
import subprocess
|
import subprocess
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.domain.shared.media import AudioTrack, MediaInfo, SubtitleTrack, VideoTrack
|
||||||
from alfred.domain.shared.ports import SubtitleStreamInfo
|
from alfred.domain.shared.ports import SubtitleStreamInfo
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
_FFPROBE_TIMEOUT_SECONDS = 30
|
_FFPROBE_TIMEOUT_SECONDS = 30
|
||||||
|
|
||||||
|
_FFPROBE_FULL_CMD = [
|
||||||
|
"ffprobe",
|
||||||
|
"-v",
|
||||||
|
"quiet",
|
||||||
|
"-print_format",
|
||||||
|
"json",
|
||||||
|
"-show_streams",
|
||||||
|
"-show_format",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
class FfprobeMediaProber:
|
class FfprobeMediaProber:
|
||||||
"""Inspect media files by shelling out to ``ffprobe``.
|
"""Inspect media files by shelling out to ``ffprobe``.
|
||||||
@@ -63,3 +74,101 @@ class FfprobeMediaProber:
|
|||||||
)
|
)
|
||||||
)
|
)
|
||||||
return streams
|
return streams
|
||||||
|
|
||||||
|
def probe(self, video: Path) -> MediaInfo | None:
|
||||||
|
"""Run ffprobe on ``video`` and return a :class:`MediaInfo`.
|
||||||
|
|
||||||
|
Returns ``None`` when ffprobe is not available, times out, or
|
||||||
|
the file cannot be parsed. Never raises.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
[*_FFPROBE_FULL_CMD, str(video)],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=_FFPROBE_TIMEOUT_SECONDS,
|
||||||
|
check=False,
|
||||||
|
)
|
||||||
|
except (subprocess.TimeoutExpired, FileNotFoundError) as e:
|
||||||
|
logger.warning("ffprobe failed on %s: %s", video, e)
|
||||||
|
return None
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
logger.warning("ffprobe failed on %s: %s", video, result.stderr.strip())
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = json.loads(result.stdout)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
logger.warning("ffprobe returned invalid JSON for %s", video)
|
||||||
|
return None
|
||||||
|
|
||||||
|
return _parse_media_info(data)
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_media_info(data: dict) -> MediaInfo:
|
||||||
|
"""Translate raw ffprobe JSON into a :class:`MediaInfo` snapshot."""
|
||||||
|
streams = data.get("streams", [])
|
||||||
|
fmt = data.get("format", {})
|
||||||
|
|
||||||
|
duration_seconds: float | None = None
|
||||||
|
bitrate_kbps: int | None = None
|
||||||
|
if "duration" in fmt:
|
||||||
|
try:
|
||||||
|
duration_seconds = float(fmt["duration"])
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
if "bit_rate" in fmt:
|
||||||
|
try:
|
||||||
|
bitrate_kbps = int(fmt["bit_rate"]) // 1000
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
video_tracks: list[VideoTrack] = []
|
||||||
|
audio_tracks: list[AudioTrack] = []
|
||||||
|
subtitle_tracks: list[SubtitleTrack] = []
|
||||||
|
|
||||||
|
for stream in streams:
|
||||||
|
codec_type = stream.get("codec_type")
|
||||||
|
|
||||||
|
if codec_type == "video":
|
||||||
|
video_tracks.append(
|
||||||
|
VideoTrack(
|
||||||
|
index=stream.get("index", len(video_tracks)),
|
||||||
|
codec=stream.get("codec_name"),
|
||||||
|
width=stream.get("width"),
|
||||||
|
height=stream.get("height"),
|
||||||
|
is_default=stream.get("disposition", {}).get("default", 0) == 1,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
elif codec_type == "audio":
|
||||||
|
audio_tracks.append(
|
||||||
|
AudioTrack(
|
||||||
|
index=stream.get("index", len(audio_tracks)),
|
||||||
|
codec=stream.get("codec_name"),
|
||||||
|
channels=stream.get("channels"),
|
||||||
|
channel_layout=stream.get("channel_layout"),
|
||||||
|
language=stream.get("tags", {}).get("language"),
|
||||||
|
is_default=stream.get("disposition", {}).get("default", 0) == 1,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
elif codec_type == "subtitle":
|
||||||
|
subtitle_tracks.append(
|
||||||
|
SubtitleTrack(
|
||||||
|
index=stream.get("index", len(subtitle_tracks)),
|
||||||
|
codec=stream.get("codec_name"),
|
||||||
|
language=stream.get("tags", {}).get("language"),
|
||||||
|
is_default=stream.get("disposition", {}).get("default", 0) == 1,
|
||||||
|
is_forced=stream.get("disposition", {}).get("forced", 0) == 1,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
return MediaInfo(
|
||||||
|
video_tracks=tuple(video_tracks),
|
||||||
|
audio_tracks=tuple(audio_tracks),
|
||||||
|
subtitle_tracks=tuple(subtitle_tracks),
|
||||||
|
duration_seconds=duration_seconds,
|
||||||
|
bitrate_kbps=bitrate_kbps,
|
||||||
|
)
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ from datetime import UTC, datetime
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from alfred.domain.subtitles.entities import SubtitleCandidate
|
from alfred.domain.subtitles.entities import SubtitleScanResult
|
||||||
from alfred.application.subtitles.placer import PlacedTrack
|
from alfred.application.subtitles.placer import PlacedTrack
|
||||||
from alfred.infrastructure.metadata.store import MetadataStore
|
from alfred.infrastructure.metadata.store import MetadataStore
|
||||||
|
|
||||||
@@ -25,7 +25,7 @@ class SubtitleMetadataStore:
|
|||||||
Subtitle-pipeline view of the per-release `.alfred/metadata.yaml`.
|
Subtitle-pipeline view of the per-release `.alfred/metadata.yaml`.
|
||||||
|
|
||||||
Backed by a generic MetadataStore; this class only knows how to build
|
Backed by a generic MetadataStore; this class only knows how to build
|
||||||
a subtitle_history entry from PlacedTrack/SubtitleCandidate pairs.
|
a subtitle_history entry from PlacedTrack/SubtitleScanResult pairs.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, library_root: Path):
|
def __init__(self, library_root: Path):
|
||||||
@@ -45,7 +45,7 @@ class SubtitleMetadataStore:
|
|||||||
|
|
||||||
def append_history(
|
def append_history(
|
||||||
self,
|
self,
|
||||||
placed_pairs: list[tuple[PlacedTrack, SubtitleCandidate]],
|
placed_pairs: list[tuple[PlacedTrack, SubtitleScanResult]],
|
||||||
season: int | None = None,
|
season: int | None = None,
|
||||||
episode: int | None = None,
|
episode: int | None = None,
|
||||||
release_group: str | None = None,
|
release_group: str | None = None,
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ from typing import TYPE_CHECKING
|
|||||||
import yaml
|
import yaml
|
||||||
|
|
||||||
from alfred.domain.subtitles.aggregates import SubtitleRuleSet
|
from alfred.domain.subtitles.aggregates import SubtitleRuleSet
|
||||||
from alfred.domain.subtitles.value_objects import RuleScope
|
from alfred.domain.subtitles.value_objects import RuleScope, RuleScopeLevel
|
||||||
|
|
||||||
if TYPE_CHECKING:
|
if TYPE_CHECKING:
|
||||||
from alfred.infrastructure.persistence.memory.ltm.components.subtitle_preferences import (
|
from alfred.infrastructure.persistence.memory.ltm.components.subtitle_preferences import (
|
||||||
@@ -72,7 +72,9 @@ class RuleSetRepository:
|
|||||||
rg_data = _load_yaml(rg_path).get("override", {})
|
rg_data = _load_yaml(rg_path).get("override", {})
|
||||||
if rg_data:
|
if rg_data:
|
||||||
rg_ruleset = SubtitleRuleSet(
|
rg_ruleset = SubtitleRuleSet(
|
||||||
scope=RuleScope(level="release_group", identifier=release_group),
|
scope=RuleScope(
|
||||||
|
level=RuleScopeLevel.RELEASE_GROUP, identifier=release_group
|
||||||
|
),
|
||||||
parent=current,
|
parent=current,
|
||||||
)
|
)
|
||||||
rg_ruleset.override(**_filter_override(rg_data))
|
rg_ruleset.override(**_filter_override(rg_data))
|
||||||
@@ -85,7 +87,7 @@ class RuleSetRepository:
|
|||||||
local_data = _load_yaml(self._alfred_dir / "rules.yaml").get("override", {})
|
local_data = _load_yaml(self._alfred_dir / "rules.yaml").get("override", {})
|
||||||
if local_data:
|
if local_data:
|
||||||
local_ruleset = SubtitleRuleSet(
|
local_ruleset = SubtitleRuleSet(
|
||||||
scope=RuleScope(level="show"),
|
scope=RuleScope(level=RuleScopeLevel.SHOW),
|
||||||
parent=current,
|
parent=current,
|
||||||
)
|
)
|
||||||
local_ruleset.override(**_filter_override(local_data))
|
local_ruleset.override(**_filter_override(local_data))
|
||||||
|
|||||||
@@ -0,0 +1,17 @@
|
|||||||
|
# Known streaming distributor tokens (case-insensitive match).
|
||||||
|
#
|
||||||
|
# These tags identify *which platform* the release was sourced from
|
||||||
|
# (Netflix, Amazon, Disney+, …). Distinct from ``sources.yaml`` which
|
||||||
|
# captures the encoding origin (WEB-DL, BluRay, …). A typical release
|
||||||
|
# carries both: ``Show.S01E01.1080p.NF.WEB-DL.x264-GROUP`` →
|
||||||
|
# source=WEB-DL, distributor=NF.
|
||||||
|
distributors:
|
||||||
|
- NF # Netflix
|
||||||
|
- AMZN # Amazon Prime Video
|
||||||
|
- DSNP # Disney+
|
||||||
|
- HMAX # HBO Max
|
||||||
|
- ATVP # Apple TV+
|
||||||
|
- HULU # Hulu
|
||||||
|
- PCOK # Peacock
|
||||||
|
- PMTP # Paramount+
|
||||||
|
- CR # Crunchyroll
|
||||||
@@ -0,0 +1,45 @@
|
|||||||
|
# Translation table — ffprobe output → scene-style release tokens.
|
||||||
|
#
|
||||||
|
# Consumed by ``alfred.application.release.enrich_from_probe`` when filling
|
||||||
|
# missing ParsedRelease fields from a probed MediaInfo. Token-level values
|
||||||
|
# from the release name always win; these mappings only fire when the
|
||||||
|
# corresponding ParsedRelease field is None.
|
||||||
|
#
|
||||||
|
# Lookup is case-insensitive on the key side (ffprobe sometimes emits
|
||||||
|
# uppercase, sometimes lowercase). When no key matches, the fallback is
|
||||||
|
# ``ffprobe_value.upper()`` so unknown codecs still surface in a
|
||||||
|
# predictable form (and signal the gap to a future "learn" pass).
|
||||||
|
#
|
||||||
|
# Each section is a flat dict — values are the canonical scene tokens
|
||||||
|
# Alfred uses everywhere (filename builders, ParsedRelease fields).
|
||||||
|
|
||||||
|
# ffprobe video codec name → scene codec token
|
||||||
|
video_codec:
|
||||||
|
hevc: x265
|
||||||
|
h264: x264
|
||||||
|
h265: x265
|
||||||
|
av1: AV1
|
||||||
|
vp9: VP9
|
||||||
|
mpeg4: XviD
|
||||||
|
|
||||||
|
# ffprobe audio codec name → scene audio token
|
||||||
|
audio_codec:
|
||||||
|
eac3: EAC3
|
||||||
|
ac3: AC3
|
||||||
|
dts: DTS
|
||||||
|
truehd: TrueHD
|
||||||
|
aac: AAC
|
||||||
|
flac: FLAC
|
||||||
|
opus: OPUS
|
||||||
|
mp3: MP3
|
||||||
|
pcm_s16l: PCM
|
||||||
|
pcm_s24l: PCM
|
||||||
|
|
||||||
|
# Channel count (integer) → standard layout string.
|
||||||
|
# Keys are strings here because YAML mappings prefer string keys; the
|
||||||
|
# loader normalizes them back to int.
|
||||||
|
audio_channels:
|
||||||
|
"8": "7.1"
|
||||||
|
"6": "5.1"
|
||||||
|
"2": "2.0"
|
||||||
|
"1": "1.0"
|
||||||
@@ -0,0 +1,22 @@
|
|||||||
|
# ELiTE release naming schema.
|
||||||
|
#
|
||||||
|
# Examples seen in the wild:
|
||||||
|
# Foundation.S02.1080p.x265-ELiTE (TV season pack, no source)
|
||||||
|
#
|
||||||
|
# ELiTE often omits the source token entirely on TV releases (no WEBRip /
|
||||||
|
# BluRay), going straight from resolution to codec.
|
||||||
|
|
||||||
|
name: ELiTE
|
||||||
|
separator: "."
|
||||||
|
|
||||||
|
chunk_order:
|
||||||
|
- role: title
|
||||||
|
- role: year
|
||||||
|
optional: true
|
||||||
|
- role: season_episode
|
||||||
|
optional: true
|
||||||
|
- role: resolution
|
||||||
|
- role: source
|
||||||
|
optional: true # often absent on TV
|
||||||
|
- role: codec
|
||||||
|
- role: group
|
||||||
@@ -0,0 +1,28 @@
|
|||||||
|
# KONTRAST release naming schema.
|
||||||
|
#
|
||||||
|
# Examples seen in the wild:
|
||||||
|
# Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST (movie)
|
||||||
|
# The.Long.Walk.2025.1080p.WEBRip.x265-KONTRAST (movie)
|
||||||
|
# Slow.Horses.S05E01.1080p.WEBRip.x265-KONTRAST (TV episode)
|
||||||
|
# Slow.Horses.S05.1080p.WEBRip.x265-KONTRAST (TV season pack)
|
||||||
|
#
|
||||||
|
# Schema is a left-to-right description of the canonical chunk order.
|
||||||
|
# Each entry is a role (matching TokenRole). Optional chunks are marked
|
||||||
|
# with `optional: true`. The parser consumes tokens greedily by role,
|
||||||
|
# skipping over optional chunks that don't match.
|
||||||
|
|
||||||
|
name: KONTRAST
|
||||||
|
separator: "."
|
||||||
|
|
||||||
|
# Canonical order of structural + technical chunks (left to right).
|
||||||
|
# `title` is special-cased as "everything up to the first non-title role".
|
||||||
|
chunk_order:
|
||||||
|
- role: title
|
||||||
|
- role: year
|
||||||
|
optional: true # absent on TV releases (S01E01 instead)
|
||||||
|
- role: season_episode
|
||||||
|
optional: true # absent on movies
|
||||||
|
- role: resolution # always present (1080p, 2160p, …)
|
||||||
|
- role: source # always present (WEBRip, BluRay, …)
|
||||||
|
- role: codec # always present (x265, x264, …)
|
||||||
|
- role: group # everything after the final `-`
|
||||||
@@ -0,0 +1,20 @@
|
|||||||
|
# RARBG release naming schema.
|
||||||
|
#
|
||||||
|
# RARBG follows the canonical scene convention closely:
|
||||||
|
# Title.Year.Resolution.Source.Codec-RARBG
|
||||||
|
# For TV:
|
||||||
|
# Title.S01E01.Resolution.Source.Codec-RARBG
|
||||||
|
|
||||||
|
name: RARBG
|
||||||
|
separator: "."
|
||||||
|
|
||||||
|
chunk_order:
|
||||||
|
- role: title
|
||||||
|
- role: year
|
||||||
|
optional: true
|
||||||
|
- role: season_episode
|
||||||
|
optional: true
|
||||||
|
- role: resolution
|
||||||
|
- role: source
|
||||||
|
- role: codec
|
||||||
|
- role: group
|
||||||
@@ -0,0 +1,42 @@
|
|||||||
|
# Release parse scoring.
|
||||||
|
#
|
||||||
|
# `parse_release` returns a `ParseReport` alongside the `ParsedRelease`.
|
||||||
|
# The report carries a 0-100 confidence score computed from the annotated
|
||||||
|
# tokens, plus the road decision (EASY / SHITTY / PATH_OF_PAIN).
|
||||||
|
#
|
||||||
|
# Why YAML: the weights and the SHITTY/PoP cutoff are tuning knobs we
|
||||||
|
# expect to iterate on as fixtures grow. Keeping them in code would
|
||||||
|
# mean a commit per tweak; here the user can adjust without touching
|
||||||
|
# Python.
|
||||||
|
#
|
||||||
|
# Weights are awarded when the corresponding ParsedRelease field is
|
||||||
|
# populated (non-None, non-"UNKNOWN" for group). Season and episode
|
||||||
|
# only contribute when the parse looks like TV (season is not None).
|
||||||
|
|
||||||
|
weights:
|
||||||
|
title: 30 # structural pivot — without it nothing else matters
|
||||||
|
media_type: 20 # movie / tv_show / tv_complete / …
|
||||||
|
year: 15
|
||||||
|
season: 10 # only counted for TV-shaped releases
|
||||||
|
episode: 5
|
||||||
|
resolution: 5
|
||||||
|
source: 5
|
||||||
|
codec: 5
|
||||||
|
group: 5 # "UNKNOWN" yields 0
|
||||||
|
|
||||||
|
# Penalty applied per UNKNOWN token left in the annotated stream.
|
||||||
|
# Capped at `max_unknown_penalty` to keep a long-tail of garbage from
|
||||||
|
# pushing every release into PoP.
|
||||||
|
penalties:
|
||||||
|
unknown_token: 5
|
||||||
|
max_unknown_penalty: 30
|
||||||
|
|
||||||
|
# Decision thresholds.
|
||||||
|
#
|
||||||
|
# EASY is decided structurally (a known group schema matched) — it does
|
||||||
|
# not look at the score. SHITTY vs PATH_OF_PAIN is decided here:
|
||||||
|
#
|
||||||
|
# score >= shitty_min → SHITTY (best-effort parse usable)
|
||||||
|
# score < shitty_min → PATH_OF_PAIN (needs user / LLM help)
|
||||||
|
thresholds:
|
||||||
|
shitty_min: 60
|
||||||
@@ -21,3 +21,4 @@ separators:
|
|||||||
- "(" # parenthesis-embedded (year, edition): (2020) (Director's Cut)
|
- "(" # parenthesis-embedded (year, edition): (2020) (Director's Cut)
|
||||||
- ")"
|
- ")"
|
||||||
- "_" # underscore-as-space (old usenet, some Asian releases)
|
- "_" # underscore-as-space (old usenet, some Asian releases)
|
||||||
|
- "|" # fullwidth vertical bar U+FF5C (CJK release names, occasional decorative use)
|
||||||
|
|||||||
@@ -1,4 +1,9 @@
|
|||||||
# Known release source tokens (case-insensitive match)
|
# Known release source tokens (case-insensitive match).
|
||||||
|
#
|
||||||
|
# "Source" here means the capture/encoding origin (disc, broadcast, web
|
||||||
|
# stream) — NOT the streaming distributor (Netflix, Disney+, …). Those
|
||||||
|
# live in ``distributors.yaml`` because they're a separate dimension:
|
||||||
|
# a release is typically "WEB-DL from NF" — both should be captured.
|
||||||
sources:
|
sources:
|
||||||
- bluray
|
- bluray
|
||||||
- blu-ray
|
- blu-ray
|
||||||
@@ -14,8 +19,3 @@ sources:
|
|||||||
- dvdrip
|
- dvdrip
|
||||||
- dvd
|
- dvd
|
||||||
- vodrip
|
- vodrip
|
||||||
- amzn
|
|
||||||
- nf
|
|
||||||
- dsnp
|
|
||||||
- hmax
|
|
||||||
- atvp
|
|
||||||
|
|||||||
@@ -37,12 +37,6 @@ class Settings(BaseSettings):
|
|||||||
llm_temperature: float = 0.2
|
llm_temperature: float = 0.2
|
||||||
data_storage_dir: str = "data"
|
data_storage_dir: str = "data"
|
||||||
|
|
||||||
# --- MEDIA ---
|
|
||||||
# Minimum file size to consider a video file as a real movie (in bytes).
|
|
||||||
# 100 MB is generous enough to skip sample clips / trailers without rejecting
|
|
||||||
# legitimate low-bitrate releases (e.g. older anime, certain web rips).
|
|
||||||
min_movie_size_bytes: int = 100 * 1024 * 1024
|
|
||||||
|
|
||||||
# --- BUILD ---
|
# --- BUILD ---
|
||||||
alfred_version: str | None = None
|
alfred_version: str | None = None
|
||||||
|
|
||||||
@@ -90,15 +84,6 @@ class Settings(BaseSettings):
|
|||||||
)
|
)
|
||||||
return v
|
return v
|
||||||
|
|
||||||
@field_validator("min_movie_size_bytes")
|
|
||||||
@classmethod
|
|
||||||
def validate_min_movie_size(cls, v: int) -> int:
|
|
||||||
if v < 0:
|
|
||||||
raise ConfigurationError(
|
|
||||||
f"min_movie_size_bytes must be non-negative, got {v}"
|
|
||||||
)
|
|
||||||
return v
|
|
||||||
|
|
||||||
@field_validator("request_timeout")
|
@field_validator("request_timeout")
|
||||||
@classmethod
|
@classmethod
|
||||||
def validate_timeout(cls, v: int) -> int:
|
def validate_timeout(cls, v: int) -> int:
|
||||||
|
|||||||
@@ -88,13 +88,13 @@ def analyze(release_name: str, source_path: str | None = None) -> None:
|
|||||||
if not path.exists():
|
if not path.exists():
|
||||||
print(" (chemin inexistant, probe skipped)")
|
print(" (chemin inexistant, probe skipped)")
|
||||||
else:
|
else:
|
||||||
from alfred.infrastructure.filesystem.ffprobe import probe
|
|
||||||
from alfred.infrastructure.filesystem.find_video import find_video_file
|
from alfred.infrastructure.filesystem.find_video import find_video_file
|
||||||
|
from alfred.infrastructure.probe import FfprobeMediaProber
|
||||||
|
|
||||||
video = find_video_file(path) if path.is_dir() else path
|
video = find_video_file(path) if path.is_dir() else path
|
||||||
if video:
|
if video:
|
||||||
print(f" video file: {video.name}")
|
print(f" video file: {video.name}")
|
||||||
info = probe(video)
|
info = FfprobeMediaProber().probe(video)
|
||||||
if info:
|
if info:
|
||||||
print(f" codec: {info.video_codec}")
|
print(f" codec: {info.video_codec}")
|
||||||
print(f" resolution: {info.resolution}")
|
print(f" resolution: {info.resolution}")
|
||||||
@@ -124,8 +124,16 @@ def dry_run(release_name: str) -> None:
|
|||||||
from alfred.application.filesystem.resolve_destination import (
|
from alfred.application.filesystem.resolve_destination import (
|
||||||
resolve_season_destination,
|
resolve_season_destination,
|
||||||
)
|
)
|
||||||
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
|
from alfred.infrastructure.probe import FfprobeMediaProber
|
||||||
|
|
||||||
result = resolve_season_destination(release_name, tmdb_title, tmdb_year)
|
result = resolve_season_destination(
|
||||||
|
release_name,
|
||||||
|
tmdb_title,
|
||||||
|
tmdb_year,
|
||||||
|
YamlReleaseKnowledge(),
|
||||||
|
FfprobeMediaProber(),
|
||||||
|
)
|
||||||
d = result.to_dict()
|
d = result.to_dict()
|
||||||
print()
|
print()
|
||||||
print(json.dumps(d, indent=2, ensure_ascii=False))
|
print(json.dumps(d, indent=2, ensure_ascii=False))
|
||||||
@@ -203,8 +211,16 @@ def do_move(release_name: str, source_folder: str | None = None) -> None:
|
|||||||
from alfred.application.filesystem.resolve_destination import (
|
from alfred.application.filesystem.resolve_destination import (
|
||||||
resolve_season_destination,
|
resolve_season_destination,
|
||||||
)
|
)
|
||||||
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
|
from alfred.infrastructure.probe import FfprobeMediaProber
|
||||||
|
|
||||||
result = resolve_season_destination(release_name, tmdb_title, tmdb_year)
|
result = resolve_season_destination(
|
||||||
|
release_name,
|
||||||
|
tmdb_title,
|
||||||
|
tmdb_year,
|
||||||
|
YamlReleaseKnowledge(),
|
||||||
|
FfprobeMediaProber(),
|
||||||
|
)
|
||||||
d = result.to_dict()
|
d = result.to_dict()
|
||||||
|
|
||||||
if d["status"] == "needs_clarification":
|
if d["status"] == "needs_clarification":
|
||||||
|
|||||||
@@ -98,9 +98,9 @@ def main() -> None:
|
|||||||
print(c(f"Error: {path} does not exist", RED), file=sys.stderr)
|
print(c(f"Error: {path} does not exist", RED), file=sys.stderr)
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
from alfred.infrastructure.filesystem.ffprobe import probe
|
from alfred.infrastructure.probe import FfprobeMediaProber
|
||||||
|
|
||||||
info = probe(path)
|
info = FfprobeMediaProber().probe(path)
|
||||||
if info is None:
|
if info is None:
|
||||||
print(c("Error: ffprobe failed to probe the file", RED), file=sys.stderr)
|
print(c("Error: ffprobe failed to probe the file", RED), file=sys.stderr)
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|||||||
@@ -100,11 +100,18 @@ def main() -> None:
|
|||||||
print(c(f"Error: {downloads} does not exist", RED), file=sys.stderr)
|
print(c(f"Error: {downloads} does not exist", RED), file=sys.stderr)
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
from alfred.application.filesystem.detect_media_type import detect_media_type
|
from dataclasses import replace
|
||||||
from alfred.application.filesystem.enrich_from_probe import enrich_from_probe
|
|
||||||
|
from alfred.application.release.detect_media_type import detect_media_type
|
||||||
|
from alfred.application.release.enrich_from_probe import enrich_from_probe
|
||||||
from alfred.domain.release.services import parse_release
|
from alfred.domain.release.services import parse_release
|
||||||
from alfred.infrastructure.filesystem.ffprobe import probe
|
from alfred.domain.release.value_objects import MediaTypeToken
|
||||||
from alfred.infrastructure.filesystem.find_video import find_video_file
|
from alfred.infrastructure.filesystem.find_video import find_video_file
|
||||||
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
|
from alfred.infrastructure.probe import FfprobeMediaProber
|
||||||
|
|
||||||
|
_kb = YamlReleaseKnowledge()
|
||||||
|
_prober = FfprobeMediaProber()
|
||||||
|
|
||||||
entries = sorted(downloads.iterdir(), key=lambda p: p.name.lower())
|
entries = sorted(downloads.iterdir(), key=lambda p: p.name.lower())
|
||||||
total = len(entries)
|
total = len(entries)
|
||||||
@@ -121,14 +128,14 @@ def main() -> None:
|
|||||||
name = entry.name
|
name = entry.name
|
||||||
|
|
||||||
try:
|
try:
|
||||||
p = parse_release(name)
|
p, _report = parse_release(name, _kb)
|
||||||
p.media_type = detect_media_type(p, entry)
|
p = replace(p, media_type=MediaTypeToken(detect_media_type(p, entry, _kb)))
|
||||||
if p.media_type not in ("unknown", "other"):
|
if p.media_type not in ("unknown", "other"):
|
||||||
video_file = find_video_file(entry)
|
video_file = find_video_file(entry)
|
||||||
if video_file:
|
if video_file:
|
||||||
media_info = probe(video_file)
|
media_info = _prober.probe(video_file)
|
||||||
if media_info:
|
if media_info:
|
||||||
enrich_from_probe(p, media_info)
|
p = enrich_from_probe(p, media_info, _kb)
|
||||||
warnings = _assess(p)
|
warnings = _assess(p)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
warnings = [f"parse error: {e}"]
|
warnings = [f"parse error: {e}"]
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
"""Tests for ``alfred.application.filesystem.detect_media_type``.
|
"""Tests for ``alfred.application.release.detect_media_type``.
|
||||||
|
|
||||||
The function refines a ``ParsedRelease.media_type`` using filesystem evidence.
|
The function refines a ``ParsedRelease.media_type`` using filesystem evidence.
|
||||||
|
|
||||||
@@ -18,7 +18,7 @@ from pathlib import Path
|
|||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
from alfred.application.filesystem.detect_media_type import detect_media_type
|
from alfred.application.release.detect_media_type import detect_media_type
|
||||||
from alfred.domain.release.services import parse_release
|
from alfred.domain.release.services import parse_release
|
||||||
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
|
|
||||||
@@ -28,11 +28,14 @@ _KB = YamlReleaseKnowledge()
|
|||||||
def _parsed(media_type: str = "movie"):
|
def _parsed(media_type: str = "movie"):
|
||||||
"""Build a ParsedRelease with the requested media_type via the real parser."""
|
"""Build a ParsedRelease with the requested media_type via the real parser."""
|
||||||
if media_type == "tv_show":
|
if media_type == "tv_show":
|
||||||
return parse_release("Show.S01E01.1080p-GRP", _KB)
|
parsed, _ = parse_release("Show.S01E01.1080p-GRP", _KB)
|
||||||
|
return parsed
|
||||||
if media_type == "movie":
|
if media_type == "movie":
|
||||||
return parse_release("Movie.2020.1080p-GRP", _KB)
|
parsed, _ = parse_release("Movie.2020.1080p-GRP", _KB)
|
||||||
|
return parsed
|
||||||
# "unknown" / other — feed a name the parser can't classify
|
# "unknown" / other — feed a name the parser can't classify
|
||||||
return parse_release("randomthing", _KB)
|
parsed, _ = parse_release("randomthing", _KB)
|
||||||
|
return parsed
|
||||||
|
|
||||||
|
|
||||||
# --------------------------------------------------------------------------- #
|
# --------------------------------------------------------------------------- #
|
||||||
|
|||||||
@@ -1,8 +1,8 @@
|
|||||||
"""Tests for ``alfred.application.filesystem.enrich_from_probe``.
|
"""Tests for ``alfred.application.release.enrich_from_probe``.
|
||||||
|
|
||||||
The function mutates a ``ParsedRelease`` in place using ffprobe ``MediaInfo``.
|
The function returns a new ``ParsedRelease`` with ``None`` fields filled
|
||||||
Token-level values from the release name always win — only ``None`` fields
|
from ffprobe ``MediaInfo``. Token-level values from the release name
|
||||||
are filled.
|
always win — only ``None`` fields are filled.
|
||||||
|
|
||||||
Coverage:
|
Coverage:
|
||||||
|
|
||||||
@@ -18,9 +18,12 @@ Uses real ``ParsedRelease`` / ``MediaInfo`` instances — no mocking needed.
|
|||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
from alfred.application.filesystem.enrich_from_probe import enrich_from_probe
|
from alfred.application.release.enrich_from_probe import enrich_from_probe
|
||||||
from alfred.domain.release.value_objects import ParsedRelease
|
from alfred.domain.release.value_objects import ParsedRelease
|
||||||
from alfred.domain.shared.media import AudioTrack, MediaInfo, VideoTrack
|
from alfred.domain.shared.media import AudioTrack, MediaInfo, VideoTrack
|
||||||
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
|
|
||||||
|
_KB = YamlReleaseKnowledge()
|
||||||
|
|
||||||
|
|
||||||
def _info_with_video(*, width=None, height=None, codec=None, **rest) -> MediaInfo:
|
def _info_with_video(*, width=None, height=None, codec=None, **rest) -> MediaInfo:
|
||||||
@@ -35,7 +38,7 @@ def _bare(**overrides) -> ParsedRelease:
|
|||||||
"""Build a minimal ParsedRelease with all enrichable fields = None."""
|
"""Build a minimal ParsedRelease with all enrichable fields = None."""
|
||||||
defaults = dict(
|
defaults = dict(
|
||||||
raw="X",
|
raw="X",
|
||||||
normalised="X",
|
clean="X",
|
||||||
title="X",
|
title="X",
|
||||||
title_sanitized="X",
|
title_sanitized="X",
|
||||||
year=None,
|
year=None,
|
||||||
@@ -46,7 +49,6 @@ def _bare(**overrides) -> ParsedRelease:
|
|||||||
source=None,
|
source=None,
|
||||||
codec=None,
|
codec=None,
|
||||||
group="UNKNOWN",
|
group="UNKNOWN",
|
||||||
tech_string="",
|
|
||||||
)
|
)
|
||||||
defaults.update(overrides)
|
defaults.update(overrides)
|
||||||
return ParsedRelease(**defaults)
|
return ParsedRelease(**defaults)
|
||||||
@@ -60,17 +62,17 @@ def _bare(**overrides) -> ParsedRelease:
|
|||||||
class TestQuality:
|
class TestQuality:
|
||||||
def test_fills_when_none(self):
|
def test_fills_when_none(self):
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, _info_with_video(width=1920, height=1080))
|
p = enrich_from_probe(p, _info_with_video(width=1920, height=1080), _KB)
|
||||||
assert p.quality == "1080p"
|
assert p.quality == "1080p"
|
||||||
|
|
||||||
def test_does_not_overwrite_existing(self):
|
def test_does_not_overwrite_existing(self):
|
||||||
p = _bare(quality="2160p")
|
p = _bare(quality="2160p")
|
||||||
enrich_from_probe(p, _info_with_video(width=1920, height=1080))
|
p = enrich_from_probe(p, _info_with_video(width=1920, height=1080), _KB)
|
||||||
assert p.quality == "2160p"
|
assert p.quality == "2160p"
|
||||||
|
|
||||||
def test_no_dims_leaves_none(self):
|
def test_no_dims_leaves_none(self):
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, MediaInfo())
|
p = enrich_from_probe(p, MediaInfo(), _KB)
|
||||||
assert p.quality is None
|
assert p.quality is None
|
||||||
|
|
||||||
|
|
||||||
@@ -82,27 +84,27 @@ class TestQuality:
|
|||||||
class TestVideoCodec:
|
class TestVideoCodec:
|
||||||
def test_hevc_to_x265(self):
|
def test_hevc_to_x265(self):
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, _info_with_video(codec="hevc"))
|
p = enrich_from_probe(p, _info_with_video(codec="hevc"), _KB)
|
||||||
assert p.codec == "x265"
|
assert p.codec == "x265"
|
||||||
|
|
||||||
def test_h264_to_x264(self):
|
def test_h264_to_x264(self):
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, _info_with_video(codec="h264"))
|
p = enrich_from_probe(p, _info_with_video(codec="h264"), _KB)
|
||||||
assert p.codec == "x264"
|
assert p.codec == "x264"
|
||||||
|
|
||||||
def test_unknown_codec_uppercased(self):
|
def test_unknown_codec_uppercased(self):
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, _info_with_video(codec="weird"))
|
p = enrich_from_probe(p, _info_with_video(codec="weird"), _KB)
|
||||||
assert p.codec == "WEIRD"
|
assert p.codec == "WEIRD"
|
||||||
|
|
||||||
def test_does_not_overwrite_existing(self):
|
def test_does_not_overwrite_existing(self):
|
||||||
p = _bare(codec="HEVC")
|
p = _bare(codec="HEVC")
|
||||||
enrich_from_probe(p, _info_with_video(codec="h264"))
|
p = enrich_from_probe(p, _info_with_video(codec="h264"), _KB)
|
||||||
assert p.codec == "HEVC"
|
assert p.codec == "HEVC"
|
||||||
|
|
||||||
def test_no_codec_leaves_none(self):
|
def test_no_codec_leaves_none(self):
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, MediaInfo())
|
p = enrich_from_probe(p, MediaInfo(), _KB)
|
||||||
assert p.codec is None
|
assert p.codec is None
|
||||||
|
|
||||||
|
|
||||||
@@ -120,7 +122,7 @@ class TestAudio:
|
|||||||
]
|
]
|
||||||
)
|
)
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, info)
|
p = enrich_from_probe(p, info, _KB)
|
||||||
assert p.audio_codec == "EAC3"
|
assert p.audio_codec == "EAC3"
|
||||||
assert p.audio_channels == "5.1"
|
assert p.audio_channels == "5.1"
|
||||||
|
|
||||||
@@ -132,32 +134,32 @@ class TestAudio:
|
|||||||
]
|
]
|
||||||
)
|
)
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, info)
|
p = enrich_from_probe(p, info, _KB)
|
||||||
assert p.audio_codec == "AC3"
|
assert p.audio_codec == "AC3"
|
||||||
assert p.audio_channels == "5.1"
|
assert p.audio_channels == "5.1"
|
||||||
|
|
||||||
def test_channel_count_unknown_falls_back(self):
|
def test_channel_count_unknown_falls_back(self):
|
||||||
info = MediaInfo(audio_tracks=[AudioTrack(0, "aac", 4, "quad", "eng")])
|
info = MediaInfo(audio_tracks=[AudioTrack(0, "aac", 4, "quad", "eng")])
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, info)
|
p = enrich_from_probe(p, info, _KB)
|
||||||
assert p.audio_channels == "4ch"
|
assert p.audio_channels == "4ch"
|
||||||
|
|
||||||
def test_unknown_audio_codec_uppercased(self):
|
def test_unknown_audio_codec_uppercased(self):
|
||||||
info = MediaInfo(audio_tracks=[AudioTrack(0, "newcodec", 2, "stereo", "eng")])
|
info = MediaInfo(audio_tracks=[AudioTrack(0, "newcodec", 2, "stereo", "eng")])
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, info)
|
p = enrich_from_probe(p, info, _KB)
|
||||||
assert p.audio_codec == "NEWCODEC"
|
assert p.audio_codec == "NEWCODEC"
|
||||||
|
|
||||||
def test_no_audio_tracks(self):
|
def test_no_audio_tracks(self):
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, MediaInfo())
|
p = enrich_from_probe(p, MediaInfo(), _KB)
|
||||||
assert p.audio_codec is None
|
assert p.audio_codec is None
|
||||||
assert p.audio_channels is None
|
assert p.audio_channels is None
|
||||||
|
|
||||||
def test_does_not_overwrite_existing_audio_fields(self):
|
def test_does_not_overwrite_existing_audio_fields(self):
|
||||||
info = MediaInfo(audio_tracks=[AudioTrack(0, "ac3", 6, "5.1", "eng")])
|
info = MediaInfo(audio_tracks=[AudioTrack(0, "ac3", 6, "5.1", "eng")])
|
||||||
p = _bare(audio_codec="DTS-HD.MA", audio_channels="7.1")
|
p = _bare(audio_codec="DTS-HD.MA", audio_channels="7.1")
|
||||||
enrich_from_probe(p, info)
|
p = enrich_from_probe(p, info, _KB)
|
||||||
assert p.audio_codec == "DTS-HD.MA"
|
assert p.audio_codec == "DTS-HD.MA"
|
||||||
assert p.audio_channels == "7.1"
|
assert p.audio_channels == "7.1"
|
||||||
|
|
||||||
@@ -176,8 +178,8 @@ class TestLanguages:
|
|||||||
]
|
]
|
||||||
)
|
)
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, info)
|
p = enrich_from_probe(p, info, _KB)
|
||||||
assert p.languages == ["eng", "fre"]
|
assert p.languages == ("eng", "fre")
|
||||||
|
|
||||||
def test_skips_und(self):
|
def test_skips_und(self):
|
||||||
info = MediaInfo(
|
info = MediaInfo(
|
||||||
@@ -187,8 +189,8 @@ class TestLanguages:
|
|||||||
]
|
]
|
||||||
)
|
)
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, info)
|
p = enrich_from_probe(p, info, _KB)
|
||||||
assert p.languages == ["eng"]
|
assert p.languages == ("eng",)
|
||||||
|
|
||||||
def test_dedup_against_existing_case_insensitive(self):
|
def test_dedup_against_existing_case_insensitive(self):
|
||||||
# existing token-level languages are typically upper-case ("FRENCH", "ENG")
|
# existing token-level languages are typically upper-case ("FRENCH", "ENG")
|
||||||
@@ -200,13 +202,52 @@ class TestLanguages:
|
|||||||
AudioTrack(1, "aac", 2, "stereo", "fre"),
|
AudioTrack(1, "aac", 2, "stereo", "fre"),
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
p = _bare()
|
p = _bare(languages=("ENG",))
|
||||||
p.languages = ["ENG"]
|
p = enrich_from_probe(p, info, _KB)
|
||||||
enrich_from_probe(p, info)
|
|
||||||
# "eng" → upper "ENG" already present → skipped. "fre" → "FRE" new → kept.
|
# "eng" → upper "ENG" already present → skipped. "fre" → "FRE" new → kept.
|
||||||
assert p.languages == ["ENG", "fre"]
|
assert p.languages == ("ENG", "fre")
|
||||||
|
|
||||||
def test_no_audio_tracks_leaves_languages_empty(self):
|
def test_no_audio_tracks_leaves_languages_empty(self):
|
||||||
p = _bare()
|
p = _bare()
|
||||||
enrich_from_probe(p, MediaInfo())
|
p = enrich_from_probe(p, MediaInfo(), _KB)
|
||||||
assert p.languages == []
|
assert p.languages == ()
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
# tech_string #
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestTechString:
|
||||||
|
"""tech_string is a derived property on ParsedRelease: it always
|
||||||
|
reflects the current quality/source/codec. Enrichment never writes
|
||||||
|
it directly — it stays in sync by construction."""
|
||||||
|
|
||||||
|
def test_rebuilt_from_filled_quality_and_codec(self):
|
||||||
|
p = _bare()
|
||||||
|
p = enrich_from_probe(
|
||||||
|
p, _info_with_video(width=1920, height=1080, codec="hevc"), _KB
|
||||||
|
)
|
||||||
|
assert p.quality == "1080p"
|
||||||
|
assert p.codec == "x265"
|
||||||
|
assert p.tech_string == "1080p.x265"
|
||||||
|
|
||||||
|
def test_keeps_existing_source_when_enriching(self):
|
||||||
|
# Token-level source must stay; probe fills only None fields.
|
||||||
|
p = _bare(source="BluRay")
|
||||||
|
p = enrich_from_probe(
|
||||||
|
p, _info_with_video(width=1920, height=1080, codec="hevc"), _KB
|
||||||
|
)
|
||||||
|
assert p.tech_string == "1080p.BluRay.x265"
|
||||||
|
|
||||||
|
def test_unchanged_when_no_enrichable_video_info(self):
|
||||||
|
# No video info → nothing to fill → derived tech_string stays as it was.
|
||||||
|
p = _bare(quality="2160p", source="WEB-DL", codec="x265")
|
||||||
|
assert p.tech_string == "2160p.WEB-DL.x265"
|
||||||
|
p = enrich_from_probe(p, MediaInfo(), _KB)
|
||||||
|
assert p.tech_string == "2160p.WEB-DL.x265"
|
||||||
|
|
||||||
|
def test_empty_when_nothing_known(self):
|
||||||
|
p = _bare()
|
||||||
|
p = enrich_from_probe(p, MediaInfo(), _KB)
|
||||||
|
assert p.tech_string == ""
|
||||||
|
|||||||
@@ -0,0 +1,356 @@
|
|||||||
|
"""Tests for the ``inspect_release`` orchestrator (Phase C).
|
||||||
|
|
||||||
|
Covers the four composition steps as a black box: a real
|
||||||
|
``YamlReleaseKnowledge``, real on-disk filesystem under ``tmp_path``,
|
||||||
|
and a stubbed ``MediaProber`` so we don't depend on a system ``ffprobe``.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.application.release import InspectedResult, inspect_release
|
||||||
|
from alfred.domain.shared.media import AudioTrack, MediaInfo, VideoTrack
|
||||||
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
|
|
||||||
|
_KB = YamlReleaseKnowledge()
|
||||||
|
|
||||||
|
_MOVIE_NAME = "Inception.2010.1080p.BluRay.x264-GROUP"
|
||||||
|
_TV_NAME = "Dexter.S01E01.1080p.WEB-DL.x264-GROUP"
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
# Test doubles #
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class _StubProber:
|
||||||
|
"""Minimal MediaProber stub. Records the path it was asked to probe."""
|
||||||
|
|
||||||
|
def __init__(self, info: MediaInfo | None) -> None:
|
||||||
|
self._info = info
|
||||||
|
self.calls: list[Path] = []
|
||||||
|
|
||||||
|
def list_subtitle_streams(self, video: Path): # pragma: no cover - unused here
|
||||||
|
return []
|
||||||
|
|
||||||
|
def probe(self, video: Path) -> MediaInfo | None:
|
||||||
|
self.calls.append(video)
|
||||||
|
return self._info
|
||||||
|
|
||||||
|
|
||||||
|
class _RaisingProber:
|
||||||
|
"""A prober that would explode if called — used to assert no probe."""
|
||||||
|
|
||||||
|
def list_subtitle_streams(self, video: Path): # pragma: no cover
|
||||||
|
raise AssertionError("list_subtitle_streams must not be called")
|
||||||
|
|
||||||
|
def probe(self, video: Path): # pragma: no cover
|
||||||
|
raise AssertionError("probe must not be called")
|
||||||
|
|
||||||
|
|
||||||
|
def _media_info_1080p_h264() -> MediaInfo:
|
||||||
|
return MediaInfo(
|
||||||
|
video_tracks=(VideoTrack(index=0, codec="h264", width=1920, height=1080),),
|
||||||
|
audio_tracks=(
|
||||||
|
AudioTrack(
|
||||||
|
index=1,
|
||||||
|
codec="ac3",
|
||||||
|
channels=6,
|
||||||
|
channel_layout="5.1",
|
||||||
|
language="eng",
|
||||||
|
is_default=True,
|
||||||
|
),
|
||||||
|
),
|
||||||
|
subtitle_tracks=(),
|
||||||
|
duration_seconds=7200.0,
|
||||||
|
bitrate_kbps=8000,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
# Happy paths #
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestInspectMovieFolder:
|
||||||
|
def test_returns_inspected_result_with_all_fields(self, tmp_path: Path) -> None:
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
video = folder / "movie.mkv"
|
||||||
|
video.write_bytes(b"")
|
||||||
|
prober = _StubProber(_media_info_1080p_h264())
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
|
||||||
|
|
||||||
|
assert isinstance(result, InspectedResult)
|
||||||
|
assert result.source_path == folder
|
||||||
|
assert result.main_video == video
|
||||||
|
assert result.media_info is not None
|
||||||
|
assert result.probe_used is True
|
||||||
|
assert prober.calls == [video]
|
||||||
|
|
||||||
|
def test_parsed_carries_token_level_fields(self, tmp_path: Path) -> None:
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "movie.mkv").write_bytes(b"")
|
||||||
|
prober = _StubProber(_media_info_1080p_h264())
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
|
||||||
|
|
||||||
|
assert result.parsed.title.lower().startswith("inception")
|
||||||
|
assert result.parsed.year == 2010
|
||||||
|
assert result.parsed.group == "GROUP"
|
||||||
|
assert result.parsed.media_type == "movie"
|
||||||
|
|
||||||
|
def test_report_has_confidence_and_road(self, tmp_path: Path) -> None:
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "movie.mkv").write_bytes(b"")
|
||||||
|
prober = _StubProber(None)
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
|
||||||
|
|
||||||
|
assert 0 <= result.report.confidence <= 100
|
||||||
|
assert result.report.road in ("easy", "shitty", "path_of_pain")
|
||||||
|
|
||||||
|
|
||||||
|
class TestInspectSingleFile:
|
||||||
|
def test_file_is_its_own_main_video(self, tmp_path: Path) -> None:
|
||||||
|
f = tmp_path / f"{_MOVIE_NAME}.mkv"
|
||||||
|
f.write_bytes(b"")
|
||||||
|
prober = _StubProber(_media_info_1080p_h264())
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, f, _KB, prober)
|
||||||
|
|
||||||
|
assert result.main_video == f
|
||||||
|
assert result.probe_used is True
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
# Probe-gating logic #
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestProbeGating:
|
||||||
|
def test_no_video_means_no_probe(self, tmp_path: Path) -> None:
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
# Only a non-video file present.
|
||||||
|
(folder / "readme.txt").write_text("hi")
|
||||||
|
prober = _RaisingProber()
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
|
||||||
|
|
||||||
|
assert result.main_video is None
|
||||||
|
assert result.media_info is None
|
||||||
|
assert result.probe_used is False
|
||||||
|
|
||||||
|
def test_media_type_other_means_no_probe(self, tmp_path: Path) -> None:
|
||||||
|
# An ISO-only folder gets detect_media_type → "other".
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "disc.iso").write_bytes(b"")
|
||||||
|
prober = _RaisingProber()
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
|
||||||
|
|
||||||
|
assert result.parsed.media_type == "other"
|
||||||
|
assert result.media_info is None
|
||||||
|
assert result.probe_used is False
|
||||||
|
|
||||||
|
def test_probe_failure_keeps_probe_used_false(self, tmp_path: Path) -> None:
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "movie.mkv").write_bytes(b"")
|
||||||
|
prober = _StubProber(None) # ffprobe simulated as failing
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
|
||||||
|
|
||||||
|
assert result.main_video is not None
|
||||||
|
assert result.media_info is None
|
||||||
|
assert result.probe_used is False
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
# Mutation contract #
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestMutationContract:
|
||||||
|
def test_detect_media_type_refines_parsed(self, tmp_path: Path) -> None:
|
||||||
|
# Release name parses to "movie", but folder mixes video + non_video
|
||||||
|
# (e.g. an ISO sitting next to an mkv) → detect_media_type returns
|
||||||
|
# "unknown", which is in _NON_PROBABLE_MEDIA_TYPES → no probe.
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "movie.mkv").write_bytes(b"")
|
||||||
|
(folder / "extras.iso").write_bytes(b"")
|
||||||
|
prober = _RaisingProber()
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
|
||||||
|
|
||||||
|
assert result.parsed.media_type == "unknown"
|
||||||
|
assert result.probe_used is False
|
||||||
|
|
||||||
|
def test_enrich_runs_when_probe_succeeds(self, tmp_path: Path) -> None:
|
||||||
|
# Build a release name with no codec; probe should fill it in.
|
||||||
|
name = "Inception.2010.1080p.BluRay-GROUP"
|
||||||
|
folder = tmp_path / name
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "movie.mkv").write_bytes(b"")
|
||||||
|
prober = _StubProber(_media_info_1080p_h264())
|
||||||
|
|
||||||
|
result = inspect_release(name, folder, _KB, prober)
|
||||||
|
|
||||||
|
assert result.probe_used is True
|
||||||
|
# enrich_from_probe should have filled the missing codec field.
|
||||||
|
assert result.parsed.codec is not None
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
# Resilience #
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestResilience:
|
||||||
|
def test_nonexistent_path_does_not_raise(self, tmp_path: Path) -> None:
|
||||||
|
ghost = tmp_path / "does-not-exist"
|
||||||
|
prober = _RaisingProber()
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, ghost, _KB, prober)
|
||||||
|
|
||||||
|
assert result.main_video is None
|
||||||
|
assert result.media_info is None
|
||||||
|
assert result.probe_used is False
|
||||||
|
|
||||||
|
def test_tv_release_inspection(self, tmp_path: Path) -> None:
|
||||||
|
folder = tmp_path / _TV_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
video = folder / "episode.mkv"
|
||||||
|
video.write_bytes(b"")
|
||||||
|
prober = _StubProber(_media_info_1080p_h264())
|
||||||
|
|
||||||
|
result = inspect_release(_TV_NAME, folder, _KB, prober)
|
||||||
|
|
||||||
|
assert result.parsed.media_type == "tv_show"
|
||||||
|
assert result.parsed.season == 1
|
||||||
|
assert result.parsed.episode == 1
|
||||||
|
assert result.main_video == video
|
||||||
|
assert result.probe_used is True
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
# Frozen contract #
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestFrozen:
|
||||||
|
def test_inspected_result_is_frozen(self, tmp_path: Path) -> None:
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "movie.mkv").write_bytes(b"")
|
||||||
|
prober = _StubProber(None)
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, folder, _KB, prober)
|
||||||
|
|
||||||
|
# frozen=True → assigning a field raises FrozenInstanceError.
|
||||||
|
import dataclasses
|
||||||
|
|
||||||
|
try:
|
||||||
|
result.probe_used = True # type: ignore[misc]
|
||||||
|
except dataclasses.FrozenInstanceError:
|
||||||
|
pass
|
||||||
|
else: # pragma: no cover
|
||||||
|
raise AssertionError("InspectedResult should be frozen")
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
# recommended_action #
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestRecommendedAction:
|
||||||
|
"""``recommended_action`` collapses the orchestrator's go / wait /
|
||||||
|
skip decision into a single property. The check ordering is part
|
||||||
|
of the contract (skip wins over ask_user, ask_user wins over
|
||||||
|
process) — see the property docstring."""
|
||||||
|
|
||||||
|
def test_skip_when_no_main_video(self, tmp_path: Path) -> None:
|
||||||
|
# Folder with no video at all → main_video is None → skip.
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "readme.txt").write_text("hi")
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, folder, _KB, _RaisingProber())
|
||||||
|
|
||||||
|
assert result.main_video is None
|
||||||
|
assert result.recommended_action == "skip"
|
||||||
|
|
||||||
|
def test_skip_when_media_type_other(self, tmp_path: Path) -> None:
|
||||||
|
# Folder with only non-video files (ISO) → media_type == "other"
|
||||||
|
# AND main_video is None (find_main_video filters by video ext).
|
||||||
|
# Both branches resolve to "skip"; this asserts the contract holds.
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "disc.iso").write_bytes(b"")
|
||||||
|
|
||||||
|
result = inspect_release(_MOVIE_NAME, folder, _KB, _RaisingProber())
|
||||||
|
|
||||||
|
assert result.parsed.media_type == "other"
|
||||||
|
assert result.recommended_action == "skip"
|
||||||
|
|
||||||
|
def test_ask_user_when_media_type_unknown(self, tmp_path: Path) -> None:
|
||||||
|
# Mixed video + non-video → detect_media_type returns "unknown".
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "movie.mkv").write_bytes(b"")
|
||||||
|
(folder / "extras.iso").write_bytes(b"")
|
||||||
|
|
||||||
|
result = inspect_release(
|
||||||
|
_MOVIE_NAME, folder, _KB, _StubProber(_media_info_1080p_h264())
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.parsed.media_type == "unknown"
|
||||||
|
assert result.recommended_action == "ask_user"
|
||||||
|
|
||||||
|
def test_ask_user_when_path_of_pain_road(self, tmp_path: Path) -> None:
|
||||||
|
# Malformed name (forbidden chars) → road == "path_of_pain".
|
||||||
|
name = "garbage@#%name"
|
||||||
|
folder = tmp_path / "release"
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "movie.mkv").write_bytes(b"")
|
||||||
|
|
||||||
|
result = inspect_release(
|
||||||
|
name, folder, _KB, _StubProber(_media_info_1080p_h264())
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.report.road == "path_of_pain"
|
||||||
|
# main_video is found but the road still flags uncertainty.
|
||||||
|
assert result.main_video is not None
|
||||||
|
assert result.recommended_action == "ask_user"
|
||||||
|
|
||||||
|
def test_process_for_confident_movie(self, tmp_path: Path) -> None:
|
||||||
|
folder = tmp_path / _MOVIE_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "movie.mkv").write_bytes(b"")
|
||||||
|
|
||||||
|
result = inspect_release(
|
||||||
|
_MOVIE_NAME, folder, _KB, _StubProber(_media_info_1080p_h264())
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.parsed.media_type == "movie"
|
||||||
|
assert result.report.road in ("easy", "shitty")
|
||||||
|
assert result.recommended_action == "process"
|
||||||
|
|
||||||
|
def test_process_for_confident_tv_show(self, tmp_path: Path) -> None:
|
||||||
|
folder = tmp_path / _TV_NAME
|
||||||
|
folder.mkdir()
|
||||||
|
(folder / "episode.mkv").write_bytes(b"")
|
||||||
|
|
||||||
|
result = inspect_release(
|
||||||
|
_TV_NAME, folder, _KB, _StubProber(_media_info_1080p_h264())
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.parsed.media_type == "tv_show"
|
||||||
|
assert result.recommended_action == "process"
|
||||||
@@ -40,7 +40,7 @@ from alfred.application.filesystem.manage_subtitles import (
|
|||||||
_to_imdb_id,
|
_to_imdb_id,
|
||||||
_to_unresolved_dto,
|
_to_unresolved_dto,
|
||||||
)
|
)
|
||||||
from alfred.domain.subtitles.entities import MediaSubtitleMetadata, SubtitleCandidate
|
from alfred.domain.subtitles.entities import MediaSubtitleMetadata, SubtitleScanResult
|
||||||
from alfred.application.subtitles.placer import PlacedTrack, PlaceResult
|
from alfred.application.subtitles.placer import PlacedTrack, PlaceResult
|
||||||
from alfred.domain.subtitles.value_objects import (
|
from alfred.domain.subtitles.value_objects import (
|
||||||
ScanStrategy,
|
ScanStrategy,
|
||||||
@@ -63,8 +63,8 @@ def _track(
|
|||||||
is_embedded: bool = False,
|
is_embedded: bool = False,
|
||||||
raw_tokens: list[str] | None = None,
|
raw_tokens: list[str] | None = None,
|
||||||
file_size_kb: float | None = None,
|
file_size_kb: float | None = None,
|
||||||
) -> SubtitleCandidate:
|
) -> SubtitleScanResult:
|
||||||
return SubtitleCandidate(
|
return SubtitleScanResult(
|
||||||
language=lang,
|
language=lang,
|
||||||
format=fmt,
|
format=fmt,
|
||||||
subtitle_type=stype,
|
subtitle_type=stype,
|
||||||
|
|||||||
@@ -31,13 +31,53 @@ from alfred.application.filesystem.resolve_destination import (
|
|||||||
_Clarification,
|
_Clarification,
|
||||||
_find_existing_tvshow_folders,
|
_find_existing_tvshow_folders,
|
||||||
_resolve_series_folder,
|
_resolve_series_folder,
|
||||||
resolve_episode_destination,
|
resolve_episode_destination as _resolve_episode_destination,
|
||||||
resolve_movie_destination,
|
resolve_movie_destination as _resolve_movie_destination,
|
||||||
resolve_season_destination,
|
resolve_season_destination as _resolve_season_destination,
|
||||||
resolve_series_destination,
|
resolve_series_destination as _resolve_series_destination,
|
||||||
)
|
)
|
||||||
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
from alfred.infrastructure.persistence import Memory, set_memory
|
from alfred.infrastructure.persistence import Memory, set_memory
|
||||||
|
|
||||||
|
_KB = YamlReleaseKnowledge()
|
||||||
|
|
||||||
|
|
||||||
|
class _NullProber:
|
||||||
|
"""Default prober stub — never returns probe data."""
|
||||||
|
|
||||||
|
def list_subtitle_streams(self, video): # pragma: no cover
|
||||||
|
return []
|
||||||
|
|
||||||
|
def probe(self, video):
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
_DEFAULT_PROBER = _NullProber()
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_season_destination(*args, prober=None, **kwargs):
|
||||||
|
return _resolve_season_destination(
|
||||||
|
*args, kb=_KB, prober=prober or _DEFAULT_PROBER, **kwargs
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_episode_destination(*args, prober=None, **kwargs):
|
||||||
|
return _resolve_episode_destination(
|
||||||
|
*args, kb=_KB, prober=prober or _DEFAULT_PROBER, **kwargs
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_movie_destination(*args, prober=None, **kwargs):
|
||||||
|
return _resolve_movie_destination(
|
||||||
|
*args, kb=_KB, prober=prober or _DEFAULT_PROBER, **kwargs
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_series_destination(*args, prober=None, **kwargs):
|
||||||
|
return _resolve_series_destination(
|
||||||
|
*args, kb=_KB, prober=prober or _DEFAULT_PROBER, **kwargs
|
||||||
|
)
|
||||||
|
|
||||||
REL_EPISODE = "Oz.S01E01.1080p.WEBRip.x265-KONTRAST"
|
REL_EPISODE = "Oz.S01E01.1080p.WEBRip.x265-KONTRAST"
|
||||||
REL_SEASON = "Oz.S03.1080p.WEBRip.x265-KONTRAST"
|
REL_SEASON = "Oz.S03.1080p.WEBRip.x265-KONTRAST"
|
||||||
REL_MOVIE = "Inception.2010.1080p.BluRay.x265-GROUP"
|
REL_MOVIE = "Inception.2010.1080p.BluRay.x265-GROUP"
|
||||||
@@ -322,6 +362,102 @@ class TestSeries:
|
|||||||
assert out.status == "needs_clarification"
|
assert out.status == "needs_clarification"
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
# Probe enrichment wiring #
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class _StubProber:
|
||||||
|
"""Minimal MediaProber stub used to drive enrich_from_probe."""
|
||||||
|
|
||||||
|
def __init__(self, info):
|
||||||
|
self._info = info
|
||||||
|
|
||||||
|
def list_subtitle_streams(self, video): # pragma: no cover - unused here
|
||||||
|
return []
|
||||||
|
|
||||||
|
def probe(self, video):
|
||||||
|
return self._info
|
||||||
|
|
||||||
|
|
||||||
|
def _stereo_movie_info():
|
||||||
|
"""A MediaInfo that fills quality+codec when the release name omits them."""
|
||||||
|
from alfred.domain.shared.media import AudioTrack, MediaInfo, VideoTrack
|
||||||
|
|
||||||
|
return MediaInfo(
|
||||||
|
video_tracks=(VideoTrack(index=0, codec="hevc", width=1920, height=1080),),
|
||||||
|
audio_tracks=(
|
||||||
|
AudioTrack(
|
||||||
|
index=1,
|
||||||
|
codec="aac",
|
||||||
|
channels=2,
|
||||||
|
channel_layout="stereo",
|
||||||
|
language="eng",
|
||||||
|
is_default=True,
|
||||||
|
),
|
||||||
|
),
|
||||||
|
subtitle_tracks=(),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class TestProbeEnrichmentWiring:
|
||||||
|
"""When source_path/source_file points to a real file, the resolver
|
||||||
|
should pick up ffprobe data via inspect_release and let the enriched
|
||||||
|
tech_string land in the destination name."""
|
||||||
|
|
||||||
|
def test_movie_picks_up_probe_quality(self, cfg_memory, tmp_path):
|
||||||
|
# Release name parses to "movie" but is missing the quality token;
|
||||||
|
# probe must supply 1080p and refresh tech_string.
|
||||||
|
bare_name = "Inception.2010.BluRay.x264-GROUP"
|
||||||
|
video = tmp_path / "movie.mkv"
|
||||||
|
video.write_bytes(b"")
|
||||||
|
|
||||||
|
out = resolve_movie_destination(
|
||||||
|
bare_name,
|
||||||
|
str(video),
|
||||||
|
"Inception",
|
||||||
|
2010,
|
||||||
|
prober=_StubProber(_stereo_movie_info()),
|
||||||
|
)
|
||||||
|
|
||||||
|
assert out.status == "ok"
|
||||||
|
# tech_string -> "1080p.BluRay.x264" -> "1080p" shows up in names.
|
||||||
|
assert "1080p" in out.movie_folder_name
|
||||||
|
assert "1080p" in out.filename
|
||||||
|
|
||||||
|
def test_movie_skips_probe_when_path_missing(self, cfg_memory):
|
||||||
|
# If the file doesn't exist, no probe runs (the stub would have
|
||||||
|
# injected 1080p — its absence proves the skip).
|
||||||
|
out = resolve_movie_destination(
|
||||||
|
"Inception.2010.BluRay.x264-GROUP",
|
||||||
|
"/nowhere/m.mkv",
|
||||||
|
"Inception",
|
||||||
|
2010,
|
||||||
|
prober=_StubProber(_stereo_movie_info()),
|
||||||
|
)
|
||||||
|
assert out.status == "ok"
|
||||||
|
assert "1080p" not in out.movie_folder_name
|
||||||
|
|
||||||
|
def test_season_picks_up_probe_via_source_path(self, cfg_memory, tmp_path):
|
||||||
|
# Season pack name missing quality token; probe must add it.
|
||||||
|
bare_name = "Oz.S03.BluRay.x265-KONTRAST"
|
||||||
|
release_dir = tmp_path / bare_name
|
||||||
|
release_dir.mkdir()
|
||||||
|
(release_dir / "episode.mkv").write_bytes(b"")
|
||||||
|
|
||||||
|
out = resolve_season_destination(
|
||||||
|
bare_name,
|
||||||
|
"Oz",
|
||||||
|
1997,
|
||||||
|
source_path=str(release_dir),
|
||||||
|
prober=_StubProber(_stereo_movie_info()),
|
||||||
|
)
|
||||||
|
|
||||||
|
assert out.status == "ok"
|
||||||
|
# Series folder name embeds tech_string -> "1080p" surfaced by probe.
|
||||||
|
assert "1080p" in out.series_folder_name
|
||||||
|
|
||||||
|
|
||||||
# --------------------------------------------------------------------------- #
|
# --------------------------------------------------------------------------- #
|
||||||
# DTO to_dict() #
|
# DTO to_dict() #
|
||||||
# --------------------------------------------------------------------------- #
|
# --------------------------------------------------------------------------- #
|
||||||
|
|||||||
@@ -21,7 +21,7 @@ from unittest.mock import patch
|
|||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
from alfred.domain.subtitles.entities import SubtitleCandidate
|
from alfred.domain.subtitles.entities import SubtitleScanResult
|
||||||
from alfred.application.subtitles.placer import (
|
from alfred.application.subtitles.placer import (
|
||||||
PlacedTrack,
|
PlacedTrack,
|
||||||
PlaceResult,
|
PlaceResult,
|
||||||
@@ -46,8 +46,8 @@ def _track(
|
|||||||
fmt=SRT,
|
fmt=SRT,
|
||||||
stype=SubtitleType.STANDARD,
|
stype=SubtitleType.STANDARD,
|
||||||
is_embedded: bool = False,
|
is_embedded: bool = False,
|
||||||
) -> SubtitleCandidate:
|
) -> SubtitleScanResult:
|
||||||
return SubtitleCandidate(
|
return SubtitleScanResult(
|
||||||
language=lang,
|
language=lang,
|
||||||
format=fmt,
|
format=fmt,
|
||||||
subtitle_type=stype,
|
subtitle_type=stype,
|
||||||
|
|||||||
@@ -0,0 +1,130 @@
|
|||||||
|
"""Tests for the pre-pipeline exclusion helpers (Phase A bis)."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from alfred.application.release.supported_media import (
|
||||||
|
find_main_video,
|
||||||
|
is_supported_video,
|
||||||
|
)
|
||||||
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
|
|
||||||
|
_KB = YamlReleaseKnowledge()
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
# is_supported_video #
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestIsSupportedVideo:
|
||||||
|
def test_mkv_is_supported(self, tmp_path: Path) -> None:
|
||||||
|
f = tmp_path / "movie.mkv"
|
||||||
|
f.touch()
|
||||||
|
assert is_supported_video(f, _KB) is True
|
||||||
|
|
||||||
|
def test_mp4_is_supported(self, tmp_path: Path) -> None:
|
||||||
|
f = tmp_path / "movie.mp4"
|
||||||
|
f.touch()
|
||||||
|
assert is_supported_video(f, _KB) is True
|
||||||
|
|
||||||
|
def test_uppercase_extension_is_supported(self, tmp_path: Path) -> None:
|
||||||
|
# File systems can return mixed case; we lowercase the suffix.
|
||||||
|
f = tmp_path / "movie.MKV"
|
||||||
|
f.touch()
|
||||||
|
assert is_supported_video(f, _KB) is True
|
||||||
|
|
||||||
|
def test_srt_is_not_video(self, tmp_path: Path) -> None:
|
||||||
|
f = tmp_path / "movie.srt"
|
||||||
|
f.touch()
|
||||||
|
assert is_supported_video(f, _KB) is False
|
||||||
|
|
||||||
|
def test_nfo_is_not_video(self, tmp_path: Path) -> None:
|
||||||
|
f = tmp_path / "movie.nfo"
|
||||||
|
f.touch()
|
||||||
|
assert is_supported_video(f, _KB) is False
|
||||||
|
|
||||||
|
def test_no_extension_is_not_video(self, tmp_path: Path) -> None:
|
||||||
|
f = tmp_path / "README"
|
||||||
|
f.touch()
|
||||||
|
assert is_supported_video(f, _KB) is False
|
||||||
|
|
||||||
|
def test_directory_is_not_video(self, tmp_path: Path) -> None:
|
||||||
|
d = tmp_path / "subdir.mkv" # even with a video extension
|
||||||
|
d.mkdir()
|
||||||
|
assert is_supported_video(d, _KB) is False
|
||||||
|
|
||||||
|
def test_nonexistent_path_is_not_video(self, tmp_path: Path) -> None:
|
||||||
|
assert is_supported_video(tmp_path / "ghost.mkv", _KB) is False
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
# find_main_video #
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestFindMainVideo:
|
||||||
|
def test_single_video_file_in_folder(self, tmp_path: Path) -> None:
|
||||||
|
main = tmp_path / "Movie.2020.mkv"
|
||||||
|
main.touch()
|
||||||
|
assert find_main_video(tmp_path, _KB) == main
|
||||||
|
|
||||||
|
def test_returns_lexicographically_first_among_multiple(
|
||||||
|
self, tmp_path: Path
|
||||||
|
) -> None:
|
||||||
|
# Legitimate for season packs: pick the first episode by name.
|
||||||
|
ep2 = tmp_path / "Show.S01E02.mkv"
|
||||||
|
ep1 = tmp_path / "Show.S01E01.mkv"
|
||||||
|
ep2.touch()
|
||||||
|
ep1.touch()
|
||||||
|
assert find_main_video(tmp_path, _KB) == ep1
|
||||||
|
|
||||||
|
def test_skips_non_video_files(self, tmp_path: Path) -> None:
|
||||||
|
# nfo and srt come alphabetically before .mkv, must not win.
|
||||||
|
(tmp_path / "Movie.nfo").touch()
|
||||||
|
(tmp_path / "Movie.srt").touch()
|
||||||
|
vid = tmp_path / "Movie.mkv"
|
||||||
|
vid.touch()
|
||||||
|
assert find_main_video(tmp_path, _KB) == vid
|
||||||
|
|
||||||
|
def test_ignores_subdirectories(self, tmp_path: Path) -> None:
|
||||||
|
# A Sample/ subdir must NOT be descended into.
|
||||||
|
sample_dir = tmp_path / "Sample"
|
||||||
|
sample_dir.mkdir()
|
||||||
|
(sample_dir / "sample.mkv").touch()
|
||||||
|
main = tmp_path / "Movie.mkv"
|
||||||
|
main.touch()
|
||||||
|
assert find_main_video(tmp_path, _KB) == main
|
||||||
|
|
||||||
|
def test_only_subdirectory_with_video_returns_none(
|
||||||
|
self, tmp_path: Path
|
||||||
|
) -> None:
|
||||||
|
# No top-level video, only one inside a subdir → None.
|
||||||
|
sub = tmp_path / "Sample"
|
||||||
|
sub.mkdir()
|
||||||
|
(sub / "video.mkv").touch()
|
||||||
|
assert find_main_video(tmp_path, _KB) is None
|
||||||
|
|
||||||
|
def test_empty_folder_returns_none(self, tmp_path: Path) -> None:
|
||||||
|
assert find_main_video(tmp_path, _KB) is None
|
||||||
|
|
||||||
|
def test_nonexistent_folder_returns_none(self, tmp_path: Path) -> None:
|
||||||
|
assert find_main_video(tmp_path / "ghost", _KB) is None
|
||||||
|
|
||||||
|
def test_single_file_release_passed_as_folder_arg(
|
||||||
|
self, tmp_path: Path
|
||||||
|
) -> None:
|
||||||
|
# Some releases are a bare .mkv with no enclosing folder.
|
||||||
|
f = tmp_path / "Movie.2020.1080p.mkv"
|
||||||
|
f.touch()
|
||||||
|
assert find_main_video(f, _KB) == f
|
||||||
|
|
||||||
|
def test_single_file_non_video_passed_as_folder_arg(
|
||||||
|
self, tmp_path: Path
|
||||||
|
) -> None:
|
||||||
|
f = tmp_path / "README.nfo"
|
||||||
|
f.touch()
|
||||||
|
assert find_main_video(f, _KB) is None
|
||||||
@@ -0,0 +1,216 @@
|
|||||||
|
"""EASY-path tests for the v2 annotate-based pipeline.
|
||||||
|
|
||||||
|
These tests assert that the **v2 pipeline itself** produces the correct
|
||||||
|
annotated stream and assembled fields for releases from known groups
|
||||||
|
(KONTRAST, ELiTE, …) — without going through ``parse_release``. The
|
||||||
|
fixtures suite (``tests/domain/test_release_fixtures.py``) already
|
||||||
|
locks the user-visible ``ParsedRelease`` contract; here we cover the
|
||||||
|
internal pipeline behavior so a future refactor of ``parse_release``
|
||||||
|
can't quietly drop EASY without us noticing.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from alfred.domain.release.parser import TokenRole
|
||||||
|
from alfred.domain.release.parser.pipeline import (
|
||||||
|
_detect_group,
|
||||||
|
annotate,
|
||||||
|
assemble,
|
||||||
|
tokenize,
|
||||||
|
)
|
||||||
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
|
|
||||||
|
_KB = YamlReleaseKnowledge()
|
||||||
|
|
||||||
|
|
||||||
|
class TestDetectGroup:
|
||||||
|
def test_codec_group(self) -> None:
|
||||||
|
tokens, _ = tokenize(
|
||||||
|
"Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST", _KB
|
||||||
|
)
|
||||||
|
name, idx = _detect_group(tokens, _KB)
|
||||||
|
assert name == "KONTRAST"
|
||||||
|
assert idx == 6 # x265-KONTRAST is the 7th token
|
||||||
|
|
||||||
|
def test_unknown_when_no_dash(self) -> None:
|
||||||
|
tokens, _ = tokenize("Some.Movie.2020.1080p.WEBRip.x265.KONTRAST", _KB)
|
||||||
|
# No dash anywhere → no group detected.
|
||||||
|
name, idx = _detect_group(tokens, _KB)
|
||||||
|
assert idx is None
|
||||||
|
assert name == "UNKNOWN"
|
||||||
|
|
||||||
|
def test_skips_dashed_source(self) -> None:
|
||||||
|
# "Web-DL" must not be mistaken for a group token.
|
||||||
|
tokens, _ = tokenize("Movie.2020.1080p.Web-DL.x265-GRP", _KB)
|
||||||
|
name, idx = _detect_group(tokens, _KB)
|
||||||
|
assert name == "GRP"
|
||||||
|
|
||||||
|
|
||||||
|
class TestAnnotateEasy:
|
||||||
|
def test_kontrast_movie(self) -> None:
|
||||||
|
tokens, tag = tokenize(
|
||||||
|
"Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST", _KB
|
||||||
|
)
|
||||||
|
annotated = annotate(tokens, _KB)
|
||||||
|
assert annotated is not None, "KONTRAST should hit the EASY path"
|
||||||
|
|
||||||
|
roles = [t.role for t in annotated]
|
||||||
|
assert roles == [
|
||||||
|
TokenRole.TITLE, # Back
|
||||||
|
TokenRole.TITLE, # in
|
||||||
|
TokenRole.TITLE, # Action
|
||||||
|
TokenRole.YEAR,
|
||||||
|
TokenRole.RESOLUTION,
|
||||||
|
TokenRole.SOURCE,
|
||||||
|
TokenRole.CODEC, # x265-KONTRAST → CODEC with extra.group=KONTRAST
|
||||||
|
]
|
||||||
|
assert annotated[-1].extra["group"] == "KONTRAST"
|
||||||
|
assert annotated[-1].extra["codec"] == "x265"
|
||||||
|
|
||||||
|
def test_kontrast_tv_episode(self) -> None:
|
||||||
|
tokens, _ = tokenize(
|
||||||
|
"Slow.Horses.S05E01.1080p.WEBRip.x265-KONTRAST", _KB
|
||||||
|
)
|
||||||
|
annotated = annotate(tokens, _KB)
|
||||||
|
assert annotated is not None
|
||||||
|
|
||||||
|
# Year is optional and absent → skipped. Season_episode present.
|
||||||
|
roles = [t.role for t in annotated]
|
||||||
|
assert TokenRole.SEASON_EPISODE in roles
|
||||||
|
assert TokenRole.YEAR not in roles
|
||||||
|
|
||||||
|
def test_elite_no_source(self) -> None:
|
||||||
|
# ELiTE schema marks source as optional — Foundation.S02 omits it.
|
||||||
|
tokens, _ = tokenize("Foundation.S02.1080p.x265-ELiTE", _KB)
|
||||||
|
annotated = annotate(tokens, _KB)
|
||||||
|
assert annotated is not None, "ELiTE optional source must be tolerated"
|
||||||
|
|
||||||
|
roles = [t.role for t in annotated]
|
||||||
|
assert TokenRole.SOURCE not in roles
|
||||||
|
assert TokenRole.RESOLUTION in roles
|
||||||
|
assert TokenRole.CODEC in roles
|
||||||
|
|
||||||
|
def test_unknown_group_falls_to_shitty(self) -> None:
|
||||||
|
tokens, _ = tokenize("Some.Movie.2020.1080p.WEBRip.x264-RANDOM", _KB)
|
||||||
|
# RANDOM is not in our release_groups/ — annotate() now falls
|
||||||
|
# through to the in-pipeline SHITTY pass and returns a populated
|
||||||
|
# token list (no None sentinel anymore).
|
||||||
|
annotated = annotate(tokens, _KB)
|
||||||
|
assert annotated is not None
|
||||||
|
roles = [t.role for t in annotated]
|
||||||
|
# Title is "Some.Movie", then YEAR, RESOLUTION, SOURCE, CODEC
|
||||||
|
# carrying the group in extra.
|
||||||
|
assert TokenRole.TITLE in roles
|
||||||
|
assert TokenRole.YEAR in roles
|
||||||
|
assert TokenRole.RESOLUTION in roles
|
||||||
|
assert TokenRole.SOURCE in roles
|
||||||
|
assert TokenRole.CODEC in roles
|
||||||
|
codec_tok = next(t for t in annotated if t.role is TokenRole.CODEC)
|
||||||
|
assert codec_tok.extra.get("group") == "RANDOM"
|
||||||
|
|
||||||
|
|
||||||
|
class TestAssemble:
|
||||||
|
def test_kontrast_movie_fields(self) -> None:
|
||||||
|
name = "Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST"
|
||||||
|
tokens, tag = tokenize(name, _KB)
|
||||||
|
annotated = annotate(tokens, _KB)
|
||||||
|
fields = assemble(annotated, tag, name, _KB)
|
||||||
|
|
||||||
|
assert fields["title"] == "Back.in.Action"
|
||||||
|
assert fields["year"] == 2025
|
||||||
|
assert fields["season"] is None
|
||||||
|
assert fields["quality"] == "1080p"
|
||||||
|
assert fields["source"] == "WEBRip"
|
||||||
|
assert fields["codec"] == "x265"
|
||||||
|
assert fields["group"] == "KONTRAST"
|
||||||
|
assert fields["media_type"] == "movie"
|
||||||
|
assert fields["site_tag"] is None
|
||||||
|
|
||||||
|
def test_kontrast_tv_fields(self) -> None:
|
||||||
|
name = "Slow.Horses.S05E01.1080p.WEBRip.x265-KONTRAST"
|
||||||
|
tokens, tag = tokenize(name, _KB)
|
||||||
|
annotated = annotate(tokens, _KB)
|
||||||
|
fields = assemble(annotated, tag, name, _KB)
|
||||||
|
|
||||||
|
assert fields["title"] == "Slow.Horses"
|
||||||
|
assert fields["year"] is None
|
||||||
|
assert fields["season"] == 5
|
||||||
|
assert fields["episode"] == 1
|
||||||
|
assert fields["media_type"] == "tv_show"
|
||||||
|
assert fields["group"] == "KONTRAST"
|
||||||
|
|
||||||
|
def test_elite_season_pack(self) -> None:
|
||||||
|
name = "Foundation.S02.1080p.x265-ELiTE"
|
||||||
|
tokens, tag = tokenize(name, _KB)
|
||||||
|
annotated = annotate(tokens, _KB)
|
||||||
|
fields = assemble(annotated, tag, name, _KB)
|
||||||
|
|
||||||
|
assert fields["title"] == "Foundation"
|
||||||
|
assert fields["season"] == 2
|
||||||
|
assert fields["episode"] is None # season pack
|
||||||
|
assert fields["source"] is None # ELiTE omits it
|
||||||
|
assert fields["quality"] == "1080p"
|
||||||
|
assert fields["codec"] == "x265"
|
||||||
|
assert fields["group"] == "ELiTE"
|
||||||
|
|
||||||
|
|
||||||
|
class TestEnrichers:
|
||||||
|
"""Non-positional roles populated alongside the structural walk.
|
||||||
|
|
||||||
|
These releases would have failed the v2 EASY path before the enricher
|
||||||
|
pass landed (leftover unknown tokens would force a fallback). They
|
||||||
|
now succeed in v2 with rich metadata.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def test_bit_depth_and_audio(self) -> None:
|
||||||
|
name = "Back.in.Action.2025.1080p.WEBRip.10bit.DDP.5.1.x265-KONTRAST"
|
||||||
|
tokens, tag = tokenize(name, _KB)
|
||||||
|
annotated = annotate(tokens, _KB)
|
||||||
|
assert annotated is not None
|
||||||
|
fields = assemble(annotated, tag, name, _KB)
|
||||||
|
|
||||||
|
assert fields["title"] == "Back.in.Action"
|
||||||
|
assert fields["bit_depth"] == "10bit"
|
||||||
|
assert fields["audio_codec"] == "DDP"
|
||||||
|
assert fields["audio_channels"] == "5.1"
|
||||||
|
|
||||||
|
def test_hdr_sequence(self) -> None:
|
||||||
|
# DV.HDR10 sequence + TrueHD.Atmos sequence + 7.1 channels +
|
||||||
|
# DIRECTORS.CUT edition all in one release.
|
||||||
|
name = (
|
||||||
|
"Some.Movie.2024.DIRECTORS.CUT.2160p.BluRay.DV.HDR10."
|
||||||
|
"TrueHD.Atmos.7.1.x265-KONTRAST"
|
||||||
|
)
|
||||||
|
tokens, tag = tokenize(name, _KB)
|
||||||
|
annotated = annotate(tokens, _KB)
|
||||||
|
assert annotated is not None
|
||||||
|
fields = assemble(annotated, tag, name, _KB)
|
||||||
|
|
||||||
|
assert fields["edition"] == "DIRECTORS.CUT"
|
||||||
|
assert fields["hdr_format"] == "DV.HDR10"
|
||||||
|
assert fields["audio_codec"] == "TrueHD.Atmos"
|
||||||
|
assert fields["audio_channels"] == "7.1"
|
||||||
|
|
||||||
|
def test_multiple_languages(self) -> None:
|
||||||
|
name = "Movie.2020.FRENCH.MULTI.1080p.WEBRip.DTS.HD.MA.5.1.x265-KONTRAST"
|
||||||
|
tokens, tag = tokenize(name, _KB)
|
||||||
|
annotated = annotate(tokens, _KB)
|
||||||
|
assert annotated is not None
|
||||||
|
fields = assemble(annotated, tag, name, _KB)
|
||||||
|
|
||||||
|
assert fields["languages"] == ("FRENCH", "MULTI")
|
||||||
|
assert fields["audio_codec"] == "DTS-HD.MA"
|
||||||
|
assert fields["audio_channels"] == "5.1"
|
||||||
|
|
||||||
|
def test_tv_with_language(self) -> None:
|
||||||
|
name = "Show.S01E05.FRENCH.1080p.WEBRip.x265-KONTRAST"
|
||||||
|
tokens, tag = tokenize(name, _KB)
|
||||||
|
annotated = annotate(tokens, _KB)
|
||||||
|
assert annotated is not None
|
||||||
|
fields = assemble(annotated, tag, name, _KB)
|
||||||
|
|
||||||
|
assert fields["title"] == "Show"
|
||||||
|
assert fields["season"] == 1
|
||||||
|
assert fields["episode"] == 5
|
||||||
|
assert fields["languages"] == ("FRENCH",)
|
||||||
|
assert fields["media_type"] == "tv_show"
|
||||||
@@ -0,0 +1,79 @@
|
|||||||
|
"""Scaffolding tests for the v2 parser package.
|
||||||
|
|
||||||
|
These tests lock the **shape** of the new pipeline (token VOs, tokenize
|
||||||
|
output, site-tag stripping) before the annotate step is wired in. They
|
||||||
|
do not check parsed-release output yet — that comes once :func:`annotate`
|
||||||
|
is implemented and the fixtures-based suite switches over.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from alfred.domain.release.parser import Token, TokenRole
|
||||||
|
from alfred.domain.release.parser.pipeline import strip_site_tag, tokenize
|
||||||
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
|
|
||||||
|
_KB = YamlReleaseKnowledge()
|
||||||
|
|
||||||
|
|
||||||
|
class TestToken:
|
||||||
|
def test_default_role_is_unknown(self) -> None:
|
||||||
|
t = Token(text="1080p", index=3)
|
||||||
|
assert t.role is TokenRole.UNKNOWN
|
||||||
|
assert not t.is_annotated
|
||||||
|
|
||||||
|
def test_with_role_returns_new_instance(self) -> None:
|
||||||
|
t = Token(text="1080p", index=3)
|
||||||
|
promoted = t.with_role(TokenRole.RESOLUTION)
|
||||||
|
assert promoted is not t
|
||||||
|
assert promoted.role is TokenRole.RESOLUTION
|
||||||
|
assert t.role is TokenRole.UNKNOWN # original unchanged (frozen)
|
||||||
|
|
||||||
|
def test_with_role_merges_extra(self) -> None:
|
||||||
|
t = Token(text="x265-KONTRAST", index=5)
|
||||||
|
promoted = t.with_role(TokenRole.CODEC, group="KONTRAST")
|
||||||
|
assert promoted.role is TokenRole.CODEC
|
||||||
|
assert promoted.extra == {"group": "KONTRAST"}
|
||||||
|
|
||||||
|
|
||||||
|
class TestStripSiteTag:
|
||||||
|
def test_no_tag(self) -> None:
|
||||||
|
clean, tag = strip_site_tag("The.Movie.2020.1080p-GRP")
|
||||||
|
assert tag is None
|
||||||
|
assert clean == "The.Movie.2020.1080p-GRP"
|
||||||
|
|
||||||
|
def test_suffix_tag(self) -> None:
|
||||||
|
clean, tag = strip_site_tag("Sinners.2025.1080p-[YTS.MX]")
|
||||||
|
assert tag == "YTS.MX"
|
||||||
|
assert clean == "Sinners.2025.1080p-"
|
||||||
|
|
||||||
|
def test_prefix_tag(self) -> None:
|
||||||
|
clean, tag = strip_site_tag("[ OxTorrent.vc ] The.Title.S01E01")
|
||||||
|
assert tag == "OxTorrent.vc"
|
||||||
|
assert clean == "The.Title.S01E01"
|
||||||
|
|
||||||
|
|
||||||
|
class TestTokenize:
|
||||||
|
def test_simple_release(self) -> None:
|
||||||
|
tokens, tag = tokenize("Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST", _KB)
|
||||||
|
assert tag is None
|
||||||
|
texts = [t.text for t in tokens]
|
||||||
|
# Dash is not a separator, so x265-KONTRAST stays glued.
|
||||||
|
assert texts == [
|
||||||
|
"Back", "in", "Action", "2025", "1080p", "WEBRip", "x265-KONTRAST",
|
||||||
|
]
|
||||||
|
|
||||||
|
def test_all_tokens_start_unknown(self) -> None:
|
||||||
|
tokens, _ = tokenize("Back.in.Action.2025.1080p.WEBRip.x265-KONTRAST", _KB)
|
||||||
|
assert all(t.role is TokenRole.UNKNOWN for t in tokens)
|
||||||
|
|
||||||
|
def test_indexes_are_contiguous(self) -> None:
|
||||||
|
tokens, _ = tokenize("A.B.C.D", _KB)
|
||||||
|
assert [t.index for t in tokens] == [0, 1, 2, 3]
|
||||||
|
|
||||||
|
def test_strips_site_tag_before_tokenize(self) -> None:
|
||||||
|
tokens, tag = tokenize(
|
||||||
|
"Sinners.2025.1080p.WEBRip.x265.10bit.AAC5.1-[YTS.MX]", _KB
|
||||||
|
)
|
||||||
|
assert tag == "YTS.MX"
|
||||||
|
# Site tag substring must not appear among tokens.
|
||||||
|
assert not any("YTS" in t.text for t in tokens)
|
||||||
@@ -0,0 +1,279 @@
|
|||||||
|
"""Phase A — parse-confidence scoring.
|
||||||
|
|
||||||
|
These tests pin the score / road semantics without going through
|
||||||
|
fixtures. They exercise the small pure functions in
|
||||||
|
``alfred.domain.release.parser.scoring`` and the end-to-end contract
|
||||||
|
that ``parse_release`` returns a ``(ParsedRelease, ParseReport)`` tuple.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from alfred.domain.release.parser.scoring import (
|
||||||
|
Road,
|
||||||
|
collect_missing_critical,
|
||||||
|
collect_unknown_tokens,
|
||||||
|
compute_score,
|
||||||
|
decide_road,
|
||||||
|
)
|
||||||
|
from alfred.domain.release.parser.tokens import Token, TokenRole
|
||||||
|
from alfred.domain.release.services import parse_release
|
||||||
|
from alfred.domain.release.value_objects import (
|
||||||
|
MediaTypeToken,
|
||||||
|
ParsedRelease,
|
||||||
|
ParseReport,
|
||||||
|
TokenizationRoute,
|
||||||
|
)
|
||||||
|
from alfred.domain.shared.exceptions import ValidationError
|
||||||
|
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||||
|
|
||||||
|
_KB = YamlReleaseKnowledge()
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
# ParseReport VO #
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestParseReport:
|
||||||
|
def test_construct_with_defaults(self) -> None:
|
||||||
|
report = ParseReport(confidence=80, road="easy")
|
||||||
|
assert report.confidence == 80
|
||||||
|
assert report.road == "easy"
|
||||||
|
assert report.unknown_tokens == ()
|
||||||
|
assert report.missing_critical == ()
|
||||||
|
|
||||||
|
def test_is_frozen(self) -> None:
|
||||||
|
report = ParseReport(confidence=50, road="shitty")
|
||||||
|
with pytest.raises(Exception): # FrozenInstanceError
|
||||||
|
report.confidence = 99 # type: ignore[misc]
|
||||||
|
|
||||||
|
def test_confidence_lower_bound(self) -> None:
|
||||||
|
with pytest.raises(ValidationError):
|
||||||
|
ParseReport(confidence=-1, road="easy")
|
||||||
|
|
||||||
|
def test_confidence_upper_bound(self) -> None:
|
||||||
|
with pytest.raises(ValidationError):
|
||||||
|
ParseReport(confidence=101, road="easy")
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
# compute_score #
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
def _movie(year: int = 2020, **overrides) -> ParsedRelease:
|
||||||
|
"""Build a populated movie ParsedRelease for scoring tests."""
|
||||||
|
base = dict(
|
||||||
|
raw="Inception.2010.1080p.BluRay.x264-GROUP",
|
||||||
|
clean="Inception.2010.1080p.BluRay.x264-GROUP",
|
||||||
|
title="Inception",
|
||||||
|
title_sanitized="Inception",
|
||||||
|
year=year,
|
||||||
|
season=None,
|
||||||
|
episode=None,
|
||||||
|
episode_end=None,
|
||||||
|
quality="1080p",
|
||||||
|
source="BluRay",
|
||||||
|
codec="x264",
|
||||||
|
group="GROUP",
|
||||||
|
media_type=MediaTypeToken.MOVIE,
|
||||||
|
parse_path=TokenizationRoute.DIRECT,
|
||||||
|
)
|
||||||
|
base.update(overrides)
|
||||||
|
return ParsedRelease(**base)
|
||||||
|
|
||||||
|
|
||||||
|
def _all_annotated() -> list[Token]:
|
||||||
|
"""Token stream where everything is annotated — zero penalty."""
|
||||||
|
return [
|
||||||
|
Token("Inception", 0, TokenRole.TITLE),
|
||||||
|
Token("2010", 1, TokenRole.YEAR),
|
||||||
|
Token("1080p", 2, TokenRole.RESOLUTION),
|
||||||
|
Token("BluRay", 3, TokenRole.SOURCE),
|
||||||
|
Token("x264", 4, TokenRole.CODEC),
|
||||||
|
Token("GROUP", 5, TokenRole.GROUP),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
class TestComputeScore:
|
||||||
|
def test_fully_populated_movie_scores_high(self) -> None:
|
||||||
|
parsed = _movie()
|
||||||
|
score = compute_score(parsed, _all_annotated(), _KB)
|
||||||
|
# title 30 + media_type 20 + year 15 + resolution 5 + source 5
|
||||||
|
# + codec 5 + group 5 = 85
|
||||||
|
assert score == 85
|
||||||
|
|
||||||
|
def test_tv_show_gets_season_and_episode_weight(self) -> None:
|
||||||
|
parsed = ParsedRelease(
|
||||||
|
raw="Oz.S01E01.1080p.WEBRip.x265-KONTRAST",
|
||||||
|
clean="Oz.S01E01.1080p.WEBRip.x265-KONTRAST",
|
||||||
|
title="Oz",
|
||||||
|
title_sanitized="Oz",
|
||||||
|
year=None,
|
||||||
|
season=1,
|
||||||
|
episode=1,
|
||||||
|
episode_end=None,
|
||||||
|
quality="1080p",
|
||||||
|
source="WEBRip",
|
||||||
|
codec="x265",
|
||||||
|
group="KONTRAST",
|
||||||
|
media_type=MediaTypeToken.TV_SHOW,
|
||||||
|
parse_path=TokenizationRoute.DIRECT,
|
||||||
|
)
|
||||||
|
tokens = [
|
||||||
|
Token("Oz", 0, TokenRole.TITLE),
|
||||||
|
Token("S01E01", 1, TokenRole.SEASON_EPISODE),
|
||||||
|
Token("1080p", 2, TokenRole.RESOLUTION),
|
||||||
|
Token("WEBRip", 3, TokenRole.SOURCE),
|
||||||
|
Token("x265", 4, TokenRole.CODEC),
|
||||||
|
Token("KONTRAST", 5, TokenRole.GROUP),
|
||||||
|
]
|
||||||
|
score = compute_score(parsed, tokens, _KB)
|
||||||
|
# title 30 + media_type 20 + season 10 + episode 5 + resolution 5
|
||||||
|
# + source 5 + codec 5 + group 5 = 85 (no year)
|
||||||
|
assert score == 85
|
||||||
|
|
||||||
|
def test_unknown_tokens_subtract_penalty(self) -> None:
|
||||||
|
parsed = _movie()
|
||||||
|
tokens = _all_annotated() + [
|
||||||
|
Token("noise", 6, TokenRole.UNKNOWN),
|
||||||
|
Token("more", 7, TokenRole.UNKNOWN),
|
||||||
|
]
|
||||||
|
score = compute_score(parsed, tokens, _KB)
|
||||||
|
# 85 baseline - 2*5 unknown tokens = 75
|
||||||
|
assert score == 75
|
||||||
|
|
||||||
|
def test_unknown_penalty_capped(self) -> None:
|
||||||
|
parsed = _movie()
|
||||||
|
# 20 unknown tokens × 5 = 100 raw, capped at 30
|
||||||
|
tokens = _all_annotated() + [
|
||||||
|
Token(f"t{i}", 6 + i, TokenRole.UNKNOWN) for i in range(20)
|
||||||
|
]
|
||||||
|
score = compute_score(parsed, tokens, _KB)
|
||||||
|
assert score == 85 - 30
|
||||||
|
|
||||||
|
def test_score_clamped_to_zero(self) -> None:
|
||||||
|
# Empty-ish parse with lots of unknown tokens
|
||||||
|
parsed = _movie(year=None, quality=None, source=None, codec=None)
|
||||||
|
tokens = [Token(f"t{i}", i, TokenRole.UNKNOWN) for i in range(10)]
|
||||||
|
score = compute_score(parsed, tokens, _KB)
|
||||||
|
# title 30 + media_type 20 + group 5 = 55, -30 cap = 25
|
||||||
|
# Sanity: still clamped at 0 minimum even if math goes weird
|
||||||
|
assert 0 <= score <= 100
|
||||||
|
|
||||||
|
def test_unknown_media_type_does_not_count(self) -> None:
|
||||||
|
parsed = _movie(media_type=MediaTypeToken.UNKNOWN)
|
||||||
|
score = compute_score(parsed, _all_annotated(), _KB)
|
||||||
|
# Loses the 20 of media_type vs baseline
|
||||||
|
assert score == 85 - 20
|
||||||
|
|
||||||
|
def test_unknown_group_does_not_count(self) -> None:
|
||||||
|
parsed = _movie(group="UNKNOWN")
|
||||||
|
score = compute_score(parsed, _all_annotated(), _KB)
|
||||||
|
assert score == 85 - 5
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
# decide_road #
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestDecideRoad:
|
||||||
|
def test_known_schema_is_easy_regardless_of_score(self) -> None:
|
||||||
|
# Even a terrible score returns EASY when a schema matched.
|
||||||
|
assert decide_road(score=0, has_schema=True, kb=_KB) is Road.EASY
|
||||||
|
|
||||||
|
def test_no_schema_high_score_is_shitty(self) -> None:
|
||||||
|
assert decide_road(score=80, has_schema=False, kb=_KB) is Road.SHITTY
|
||||||
|
|
||||||
|
def test_no_schema_low_score_is_pop(self) -> None:
|
||||||
|
assert decide_road(score=10, has_schema=False, kb=_KB) is Road.PATH_OF_PAIN
|
||||||
|
|
||||||
|
def test_threshold_boundary_is_inclusive(self) -> None:
|
||||||
|
threshold = _KB.scoring["thresholds"]["shitty_min"]
|
||||||
|
assert decide_road(threshold, has_schema=False, kb=_KB) is Road.SHITTY
|
||||||
|
assert (
|
||||||
|
decide_road(threshold - 1, has_schema=False, kb=_KB)
|
||||||
|
is Road.PATH_OF_PAIN
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
# Collectors #
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestCollectors:
|
||||||
|
def test_collect_unknown_tokens_preserves_order(self) -> None:
|
||||||
|
tokens = [
|
||||||
|
Token("A", 0, TokenRole.TITLE),
|
||||||
|
Token("X", 1, TokenRole.UNKNOWN),
|
||||||
|
Token("B", 2, TokenRole.RESOLUTION),
|
||||||
|
Token("Y", 3, TokenRole.UNKNOWN),
|
||||||
|
]
|
||||||
|
assert collect_unknown_tokens(tokens) == ("X", "Y")
|
||||||
|
|
||||||
|
def test_collect_missing_critical_full(self) -> None:
|
||||||
|
empty = ParsedRelease(
|
||||||
|
raw="x",
|
||||||
|
clean="x",
|
||||||
|
title="",
|
||||||
|
title_sanitized="",
|
||||||
|
year=None,
|
||||||
|
season=None,
|
||||||
|
episode=None,
|
||||||
|
episode_end=None,
|
||||||
|
quality=None,
|
||||||
|
source=None,
|
||||||
|
codec=None,
|
||||||
|
group="UNKNOWN",
|
||||||
|
media_type=MediaTypeToken.UNKNOWN,
|
||||||
|
parse_path=TokenizationRoute.DIRECT,
|
||||||
|
)
|
||||||
|
assert set(collect_missing_critical(empty)) == {
|
||||||
|
"title",
|
||||||
|
"media_type",
|
||||||
|
"year",
|
||||||
|
}
|
||||||
|
|
||||||
|
def test_collect_missing_critical_none(self) -> None:
|
||||||
|
parsed = _movie()
|
||||||
|
assert collect_missing_critical(parsed) == ()
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
# End-to-end contract #
|
||||||
|
# --------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
class TestParseReleaseReturnsReport:
|
||||||
|
def test_returns_tuple(self) -> None:
|
||||||
|
result = parse_release("Inception.2010.1080p.BluRay.x264-GROUP", _KB)
|
||||||
|
assert isinstance(result, tuple)
|
||||||
|
assert len(result) == 2
|
||||||
|
parsed, report = result
|
||||||
|
assert isinstance(parsed, ParsedRelease)
|
||||||
|
assert isinstance(report, ParseReport)
|
||||||
|
|
||||||
|
def test_known_group_is_easy_road(self) -> None:
|
||||||
|
# KONTRAST has a schema in release_groups/
|
||||||
|
_, report = parse_release(
|
||||||
|
"Oz.S03E01.1080p.WEBRip.x265-KONTRAST", _KB
|
||||||
|
)
|
||||||
|
assert report.road == Road.EASY.value
|
||||||
|
assert report.confidence > 0
|
||||||
|
|
||||||
|
def test_unknown_group_well_formed_is_shitty(self) -> None:
|
||||||
|
# No registered schema but well-formed scene name → SHITTY
|
||||||
|
_, report = parse_release(
|
||||||
|
"Inception.2010.1080p.BluRay.x264-NOSCHEMA", _KB
|
||||||
|
)
|
||||||
|
assert report.road == Road.SHITTY.value
|
||||||
|
|
||||||
|
def test_malformed_name_is_pop(self) -> None:
|
||||||
|
# Forbidden chars (@) — short-circuits to AI / PoP.
|
||||||
|
_, report = parse_release("garbage@#%name", _KB)
|
||||||
|
assert report.road == Road.PATH_OF_PAIN.value
|
||||||
|
assert report.confidence == 0
|
||||||
@@ -26,7 +26,8 @@ _KB = YamlReleaseKnowledge()
|
|||||||
|
|
||||||
|
|
||||||
def _parse(name: str) -> ParsedRelease:
|
def _parse(name: str) -> ParsedRelease:
|
||||||
return parse_release(name, _KB)
|
parsed, _report = parse_release(name, _KB)
|
||||||
|
return parsed
|
||||||
|
|
||||||
|
|
||||||
class TestParseTVEpisode:
|
class TestParseTVEpisode:
|
||||||
@@ -263,10 +264,10 @@ class TestParsedReleaseInvariants:
|
|||||||
r = _parse(raw)
|
r = _parse(raw)
|
||||||
assert r.raw == raw
|
assert r.raw == raw
|
||||||
|
|
||||||
def test_languages_defaults_to_empty_list_not_none(self):
|
def test_languages_defaults_to_empty_tuple_not_none(self):
|
||||||
r = _parse("Movie.2020.1080p.BluRay.x264-GRP")
|
r = _parse("Movie.2020.1080p.BluRay.x264-GRP")
|
||||||
# __post_init__ ensures languages is a list, never None
|
# ``languages`` defaults to an empty tuple (frozen VO).
|
||||||
assert r.languages == []
|
assert r.languages == ()
|
||||||
|
|
||||||
def test_tech_string_joined(self):
|
def test_tech_string_joined(self):
|
||||||
r = _parse("Movie.2020.1080p.BluRay.x264-GRP")
|
r = _parse("Movie.2020.1080p.BluRay.x264-GRP")
|
||||||
|
|||||||
@@ -26,19 +26,31 @@ _KB = YamlReleaseKnowledge()
|
|||||||
FIXTURES = discover_fixtures()
|
FIXTURES = discover_fixtures()
|
||||||
|
|
||||||
|
|
||||||
|
def _fixture_param(f: ReleaseFixture) -> pytest.param:
|
||||||
|
marks = []
|
||||||
|
if f.xfail_reason:
|
||||||
|
marks.append(pytest.mark.xfail(reason=f.xfail_reason, strict=False))
|
||||||
|
return pytest.param(f, id=f.name, marks=marks)
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize(
|
@pytest.mark.parametrize(
|
||||||
"fixture",
|
"fixture",
|
||||||
FIXTURES,
|
[_fixture_param(f) for f in FIXTURES],
|
||||||
ids=[f.name for f in FIXTURES],
|
|
||||||
)
|
)
|
||||||
def test_parse_matches_fixture(fixture: ReleaseFixture, tmp_path) -> None:
|
def test_parse_matches_fixture(fixture: ReleaseFixture, tmp_path) -> None:
|
||||||
# Materialize the tree to assert it is at least well-formed YAML +
|
# Materialize the tree to assert it is at least well-formed YAML +
|
||||||
# plausible filesystem paths. Catches typos / missing leading dirs early.
|
# plausible filesystem paths. Catches typos / missing leading dirs early.
|
||||||
fixture.materialize(tmp_path)
|
fixture.materialize(tmp_path)
|
||||||
|
|
||||||
result = asdict(parse_release(fixture.release_name, _KB))
|
parsed, _report = parse_release(fixture.release_name, _KB)
|
||||||
# ``is_season_pack`` is a @property — asdict() does not include it.
|
result = asdict(parsed)
|
||||||
result["is_season_pack"] = parse_release(fixture.release_name, _KB).is_season_pack
|
# ``is_season_pack`` and ``tech_string`` are @property values —
|
||||||
|
# ``asdict()`` does not include them.
|
||||||
|
result["is_season_pack"] = parsed.is_season_pack
|
||||||
|
result["tech_string"] = parsed.tech_string
|
||||||
|
# ``languages`` is a tuple on the VO; fixtures encode it as a YAML list.
|
||||||
|
# Compare list-to-list so the equality is unambiguous.
|
||||||
|
result["languages"] = list(result.get("languages", ()))
|
||||||
|
|
||||||
for field, expected in fixture.expected_parsed.items():
|
for field, expected in fixture.expected_parsed.items():
|
||||||
assert field in result, (
|
assert field in result, (
|
||||||
|
|||||||
@@ -23,7 +23,7 @@ from unittest.mock import patch
|
|||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
from alfred.domain.shared.ports import FileEntry
|
from alfred.domain.shared.ports import FileEntry
|
||||||
from alfred.domain.subtitles.entities import SubtitleCandidate
|
from alfred.domain.subtitles.entities import SubtitleScanResult
|
||||||
from alfred.domain.subtitles.services.identifier import (
|
from alfred.domain.subtitles.services.identifier import (
|
||||||
SubtitleIdentifier,
|
SubtitleIdentifier,
|
||||||
_count_entries,
|
_count_entries,
|
||||||
@@ -310,8 +310,8 @@ class TestSizeDisambiguation:
|
|||||||
detection=TypeDetectionMethod.SIZE_AND_COUNT,
|
detection=TypeDetectionMethod.SIZE_AND_COUNT,
|
||||||
)
|
)
|
||||||
|
|
||||||
def _track(self, lang_code: str, entries: int) -> SubtitleCandidate:
|
def _track(self, lang_code: str, entries: int) -> SubtitleScanResult:
|
||||||
return SubtitleCandidate(
|
return SubtitleScanResult(
|
||||||
language=SubtitleLanguage(code=lang_code, tokens=[lang_code]),
|
language=SubtitleLanguage(code=lang_code, tokens=[lang_code]),
|
||||||
format=None,
|
format=None,
|
||||||
subtitle_type=SubtitleType.UNKNOWN,
|
subtitle_type=SubtitleType.UNKNOWN,
|
||||||
|
|||||||
@@ -18,7 +18,7 @@ from __future__ import annotations
|
|||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
from alfred.domain.subtitles.entities import SubtitleCandidate
|
from alfred.domain.subtitles.entities import SubtitleScanResult
|
||||||
from alfred.domain.subtitles.services.matcher import SubtitleMatcher
|
from alfred.domain.subtitles.services.matcher import SubtitleMatcher
|
||||||
from alfred.domain.subtitles.value_objects import (
|
from alfred.domain.subtitles.value_objects import (
|
||||||
SubtitleFormat,
|
SubtitleFormat,
|
||||||
@@ -40,8 +40,8 @@ def _track(
|
|||||||
stype: SubtitleType = SubtitleType.STANDARD,
|
stype: SubtitleType = SubtitleType.STANDARD,
|
||||||
confidence: float = 1.0,
|
confidence: float = 1.0,
|
||||||
is_embedded: bool = False,
|
is_embedded: bool = False,
|
||||||
) -> SubtitleCandidate:
|
) -> SubtitleScanResult:
|
||||||
return SubtitleCandidate(
|
return SubtitleScanResult(
|
||||||
language=lang,
|
language=lang,
|
||||||
format=fmt,
|
format=fmt,
|
||||||
subtitle_type=stype,
|
subtitle_type=stype,
|
||||||
|
|||||||
@@ -5,9 +5,9 @@ uncovered:
|
|||||||
|
|
||||||
- ``TestSubtitleFormat`` — extension matching (case-insensitive).
|
- ``TestSubtitleFormat`` — extension matching (case-insensitive).
|
||||||
- ``TestSubtitleLanguage`` — token matching (case-insensitive).
|
- ``TestSubtitleLanguage`` — token matching (case-insensitive).
|
||||||
- ``TestSubtitleCandidateDestName`` — ``destination_name`` property:
|
- ``TestSubtitleScanResultDestName`` — ``destination_name`` property:
|
||||||
standard / SDH / forced naming, error on missing language or format.
|
standard / SDH / forced naming, error on missing language or format.
|
||||||
- ``TestSubtitleCandidateRepr`` — debug repr for embedded vs external.
|
- ``TestSubtitleScanResultRepr`` — debug repr for embedded vs external.
|
||||||
- ``TestMediaSubtitleMetadata`` — ``all_tracks`` / ``total_count`` /
|
- ``TestMediaSubtitleMetadata`` — ``all_tracks`` / ``total_count`` /
|
||||||
``unresolved_tracks``.
|
``unresolved_tracks``.
|
||||||
- ``TestAvailableSubtitles`` — utility dedup by (lang, type).
|
- ``TestAvailableSubtitles`` — utility dedup by (lang, type).
|
||||||
@@ -24,10 +24,11 @@ from pathlib import Path
|
|||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
from alfred.domain.subtitles.aggregates import SubtitleRuleSet
|
from alfred.domain.subtitles.aggregates import SubtitleRuleSet
|
||||||
from alfred.domain.subtitles.entities import MediaSubtitleMetadata, SubtitleCandidate
|
from alfred.domain.subtitles.entities import MediaSubtitleMetadata, SubtitleScanResult
|
||||||
from alfred.domain.subtitles.services.utils import available_subtitles
|
from alfred.domain.subtitles.services.utils import available_subtitles
|
||||||
from alfred.domain.subtitles.value_objects import (
|
from alfred.domain.subtitles.value_objects import (
|
||||||
RuleScope,
|
RuleScope,
|
||||||
|
RuleScopeLevel,
|
||||||
SubtitleFormat,
|
SubtitleFormat,
|
||||||
SubtitleLanguage,
|
SubtitleLanguage,
|
||||||
SubtitleMatchingRules,
|
SubtitleMatchingRules,
|
||||||
@@ -73,7 +74,7 @@ class TestSubtitleLanguage:
|
|||||||
|
|
||||||
|
|
||||||
# --------------------------------------------------------------------------- #
|
# --------------------------------------------------------------------------- #
|
||||||
# SubtitleCandidate #
|
# SubtitleScanResult #
|
||||||
# --------------------------------------------------------------------------- #
|
# --------------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
@@ -81,50 +82,50 @@ SRT = SubtitleFormat(id="srt", extensions=[".srt"])
|
|||||||
FRA = SubtitleLanguage(code="fra", tokens=["fr", "fre"])
|
FRA = SubtitleLanguage(code="fra", tokens=["fr", "fre"])
|
||||||
|
|
||||||
|
|
||||||
class TestSubtitleCandidateDestName:
|
class TestSubtitleScanResultDestName:
|
||||||
def test_standard(self):
|
def test_standard(self):
|
||||||
t = SubtitleCandidate(
|
t = SubtitleScanResult(
|
||||||
language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD
|
language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD
|
||||||
)
|
)
|
||||||
assert t.destination_name == "fra.srt"
|
assert t.destination_name == "fra.srt"
|
||||||
|
|
||||||
def test_sdh(self):
|
def test_sdh(self):
|
||||||
t = SubtitleCandidate(language=FRA, format=SRT, subtitle_type=SubtitleType.SDH)
|
t = SubtitleScanResult(language=FRA, format=SRT, subtitle_type=SubtitleType.SDH)
|
||||||
assert t.destination_name == "fra.sdh.srt"
|
assert t.destination_name == "fra.sdh.srt"
|
||||||
|
|
||||||
def test_forced(self):
|
def test_forced(self):
|
||||||
t = SubtitleCandidate(
|
t = SubtitleScanResult(
|
||||||
language=FRA, format=SRT, subtitle_type=SubtitleType.FORCED
|
language=FRA, format=SRT, subtitle_type=SubtitleType.FORCED
|
||||||
)
|
)
|
||||||
assert t.destination_name == "fra.forced.srt"
|
assert t.destination_name == "fra.forced.srt"
|
||||||
|
|
||||||
def test_unknown_treated_as_standard(self):
|
def test_unknown_treated_as_standard(self):
|
||||||
t = SubtitleCandidate(
|
t = SubtitleScanResult(
|
||||||
language=FRA, format=SRT, subtitle_type=SubtitleType.UNKNOWN
|
language=FRA, format=SRT, subtitle_type=SubtitleType.UNKNOWN
|
||||||
)
|
)
|
||||||
# UNKNOWN doesn't add a suffix → same as standard.
|
# UNKNOWN doesn't add a suffix → same as standard.
|
||||||
assert t.destination_name == "fra.srt"
|
assert t.destination_name == "fra.srt"
|
||||||
|
|
||||||
def test_missing_language_raises(self):
|
def test_missing_language_raises(self):
|
||||||
t = SubtitleCandidate(language=None, format=SRT)
|
t = SubtitleScanResult(language=None, format=SRT)
|
||||||
with pytest.raises(ValueError, match="language or format missing"):
|
with pytest.raises(ValueError, match="language or format missing"):
|
||||||
t.destination_name
|
t.destination_name
|
||||||
|
|
||||||
def test_missing_format_raises(self):
|
def test_missing_format_raises(self):
|
||||||
t = SubtitleCandidate(language=FRA, format=None)
|
t = SubtitleScanResult(language=FRA, format=None)
|
||||||
with pytest.raises(ValueError, match="language or format missing"):
|
with pytest.raises(ValueError, match="language or format missing"):
|
||||||
t.destination_name
|
t.destination_name
|
||||||
|
|
||||||
def test_extension_dot_stripped(self):
|
def test_extension_dot_stripped(self):
|
||||||
# Format extension is ".srt" — leading dot must not be duplicated.
|
# Format extension is ".srt" — leading dot must not be duplicated.
|
||||||
t = SubtitleCandidate(language=FRA, format=SRT)
|
t = SubtitleScanResult(language=FRA, format=SRT)
|
||||||
assert t.destination_name.endswith(".srt")
|
assert t.destination_name.endswith(".srt")
|
||||||
assert ".." not in t.destination_name
|
assert ".." not in t.destination_name
|
||||||
|
|
||||||
|
|
||||||
class TestSubtitleCandidateRepr:
|
class TestSubtitleScanResultRepr:
|
||||||
def test_embedded_repr(self):
|
def test_embedded_repr(self):
|
||||||
t = SubtitleCandidate(
|
t = SubtitleScanResult(
|
||||||
language=FRA, format=None, is_embedded=True, confidence=1.0
|
language=FRA, format=None, is_embedded=True, confidence=1.0
|
||||||
)
|
)
|
||||||
r = repr(t)
|
r = repr(t)
|
||||||
@@ -134,14 +135,14 @@ class TestSubtitleCandidateRepr:
|
|||||||
def test_external_repr_uses_filename(self, tmp_path):
|
def test_external_repr_uses_filename(self, tmp_path):
|
||||||
f = tmp_path / "fr.srt"
|
f = tmp_path / "fr.srt"
|
||||||
f.write_text("")
|
f.write_text("")
|
||||||
t = SubtitleCandidate(language=FRA, format=SRT, file_path=f, confidence=0.85)
|
t = SubtitleScanResult(language=FRA, format=SRT, file_path=f, confidence=0.85)
|
||||||
r = repr(t)
|
r = repr(t)
|
||||||
assert "fra" in r
|
assert "fra" in r
|
||||||
assert "fr.srt" in r
|
assert "fr.srt" in r
|
||||||
assert "0.85" in r
|
assert "0.85" in r
|
||||||
|
|
||||||
def test_unresolved_repr(self):
|
def test_unresolved_repr(self):
|
||||||
t = SubtitleCandidate(language=None, format=None)
|
t = SubtitleScanResult(language=None, format=None)
|
||||||
r = repr(t)
|
r = repr(t)
|
||||||
assert "?" in r
|
assert "?" in r
|
||||||
|
|
||||||
@@ -159,8 +160,8 @@ class TestMediaSubtitleMetadata:
|
|||||||
assert m.unresolved_tracks == []
|
assert m.unresolved_tracks == []
|
||||||
|
|
||||||
def test_aggregates_embedded_and_external(self):
|
def test_aggregates_embedded_and_external(self):
|
||||||
e = SubtitleCandidate(language=FRA, format=None, is_embedded=True)
|
e = SubtitleScanResult(language=FRA, format=None, is_embedded=True)
|
||||||
x = SubtitleCandidate(language=FRA, format=SRT, file_path=Path("/x.srt"))
|
x = SubtitleScanResult(language=FRA, format=SRT, file_path=Path("/x.srt"))
|
||||||
m = MediaSubtitleMetadata(
|
m = MediaSubtitleMetadata(
|
||||||
media_id=None,
|
media_id=None,
|
||||||
media_type="movie",
|
media_type="movie",
|
||||||
@@ -173,13 +174,13 @@ class TestMediaSubtitleMetadata:
|
|||||||
def test_unresolved_tracks_only_external_with_none_lang(self):
|
def test_unresolved_tracks_only_external_with_none_lang(self):
|
||||||
# An embedded with None language must NOT appear in unresolved_tracks
|
# An embedded with None language must NOT appear in unresolved_tracks
|
||||||
# (the property only iterates external_tracks).
|
# (the property only iterates external_tracks).
|
||||||
embedded_unknown = SubtitleCandidate(
|
embedded_unknown = SubtitleScanResult(
|
||||||
language=None, format=None, is_embedded=True
|
language=None, format=None, is_embedded=True
|
||||||
)
|
)
|
||||||
external_known = SubtitleCandidate(
|
external_known = SubtitleScanResult(
|
||||||
language=FRA, format=SRT, file_path=Path("/a.srt")
|
language=FRA, format=SRT, file_path=Path("/a.srt")
|
||||||
)
|
)
|
||||||
external_unknown = SubtitleCandidate(
|
external_unknown = SubtitleScanResult(
|
||||||
language=None, format=SRT, file_path=Path("/b.srt")
|
language=None, format=SRT, file_path=Path("/b.srt")
|
||||||
)
|
)
|
||||||
m = MediaSubtitleMetadata(
|
m = MediaSubtitleMetadata(
|
||||||
@@ -200,14 +201,14 @@ class TestAvailableSubtitles:
|
|||||||
def test_dedup_by_lang_and_type(self):
|
def test_dedup_by_lang_and_type(self):
|
||||||
ENG = SubtitleLanguage(code="eng", tokens=["en"])
|
ENG = SubtitleLanguage(code="eng", tokens=["en"])
|
||||||
tracks = [
|
tracks = [
|
||||||
SubtitleCandidate(
|
SubtitleScanResult(
|
||||||
language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD
|
language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD
|
||||||
),
|
),
|
||||||
SubtitleCandidate(
|
SubtitleScanResult(
|
||||||
language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD
|
language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD
|
||||||
),
|
),
|
||||||
SubtitleCandidate(language=FRA, format=SRT, subtitle_type=SubtitleType.SDH),
|
SubtitleScanResult(language=FRA, format=SRT, subtitle_type=SubtitleType.SDH),
|
||||||
SubtitleCandidate(
|
SubtitleScanResult(
|
||||||
language=ENG, format=SRT, subtitle_type=SubtitleType.STANDARD
|
language=ENG, format=SRT, subtitle_type=SubtitleType.STANDARD
|
||||||
),
|
),
|
||||||
]
|
]
|
||||||
@@ -221,10 +222,10 @@ class TestAvailableSubtitles:
|
|||||||
|
|
||||||
def test_none_language_treated_as_key(self):
|
def test_none_language_treated_as_key(self):
|
||||||
# Tracks with no language form a single None-keyed bucket.
|
# Tracks with no language form a single None-keyed bucket.
|
||||||
t1 = SubtitleCandidate(
|
t1 = SubtitleScanResult(
|
||||||
language=None, format=SRT, subtitle_type=SubtitleType.UNKNOWN
|
language=None, format=SRT, subtitle_type=SubtitleType.UNKNOWN
|
||||||
)
|
)
|
||||||
t2 = SubtitleCandidate(
|
t2 = SubtitleScanResult(
|
||||||
language=None, format=SRT, subtitle_type=SubtitleType.UNKNOWN
|
language=None, format=SRT, subtitle_type=SubtitleType.UNKNOWN
|
||||||
)
|
)
|
||||||
result = available_subtitles([t1, t2])
|
result = available_subtitles([t1, t2])
|
||||||
@@ -257,7 +258,7 @@ class TestSubtitleRuleSet:
|
|||||||
def test_override_partial_keeps_parent_for_unset_fields(self):
|
def test_override_partial_keeps_parent_for_unset_fields(self):
|
||||||
parent = SubtitleRuleSet.global_default()
|
parent = SubtitleRuleSet.global_default()
|
||||||
child = SubtitleRuleSet(
|
child = SubtitleRuleSet(
|
||||||
scope=RuleScope(level="show", identifier="tt1"),
|
scope=RuleScope(level=RuleScopeLevel.SHOW, identifier="tt1"),
|
||||||
parent=parent,
|
parent=parent,
|
||||||
)
|
)
|
||||||
child.override(languages=["jpn"])
|
child.override(languages=["jpn"])
|
||||||
@@ -267,14 +268,14 @@ class TestSubtitleRuleSet:
|
|||||||
assert rules.min_confidence == parent.resolve(_DEFAULT_RULES).min_confidence
|
assert rules.min_confidence == parent.resolve(_DEFAULT_RULES).min_confidence
|
||||||
|
|
||||||
def test_to_dict_only_emits_set_deltas(self):
|
def test_to_dict_only_emits_set_deltas(self):
|
||||||
rs = SubtitleRuleSet(scope=RuleScope(level="show", identifier="tt1"))
|
rs = SubtitleRuleSet(scope=RuleScope(level=RuleScopeLevel.SHOW, identifier="tt1"))
|
||||||
rs.override(languages=["fra"])
|
rs.override(languages=["fra"])
|
||||||
out = rs.to_dict()
|
out = rs.to_dict()
|
||||||
assert out["scope"] == {"level": "show", "identifier": "tt1"}
|
assert out["scope"] == {"level": "show", "identifier": "tt1"}
|
||||||
assert out["override"] == {"languages": ["fra"]}
|
assert out["override"] == {"languages": ["fra"]}
|
||||||
|
|
||||||
def test_to_dict_full_override(self):
|
def test_to_dict_full_override(self):
|
||||||
rs = SubtitleRuleSet(scope=RuleScope(level="global"))
|
rs = SubtitleRuleSet(scope=RuleScope(level=RuleScopeLevel.GLOBAL))
|
||||||
rs.override(
|
rs.override(
|
||||||
languages=["fra"],
|
languages=["fra"],
|
||||||
formats=["srt"],
|
formats=["srt"],
|
||||||
|
|||||||
Vendored
+8
@@ -39,6 +39,14 @@ class ReleaseFixture:
|
|||||||
def routing(self) -> dict:
|
def routing(self) -> dict:
|
||||||
return self.data.get("routing", {})
|
return self.data.get("routing", {})
|
||||||
|
|
||||||
|
@property
|
||||||
|
def xfail_reason(self) -> str | None:
|
||||||
|
"""If set, the fixture is expected to fail — wrapped with
|
||||||
|
``pytest.mark.xfail`` by the test runner. Used for known
|
||||||
|
not-supported pathological cases (typically PATH OF PAIN bucket).
|
||||||
|
"""
|
||||||
|
return self.data.get("xfail_reason")
|
||||||
|
|
||||||
def materialize(self, root: Path) -> None:
|
def materialize(self, root: Path) -> None:
|
||||||
"""Create the fixture's ``tree`` as empty files/dirs under ``root``."""
|
"""Create the fixture's ``tree`` as empty files/dirs under ``root``."""
|
||||||
for entry in self.tree:
|
for entry in self.tree:
|
||||||
|
|||||||
@@ -1,5 +1,10 @@
|
|||||||
release_name: "Deutschland 83-86-89 (2015) Season 1-3 S01-S03 (1080p BluRay x265 HEVC 10bit AAC 5.1 German Kappa)"
|
release_name: "Deutschland 83-86-89 (2015) Season 1-3 S01-S03 (1080p BluRay x265 HEVC 10bit AAC 5.1 German Kappa)"
|
||||||
|
|
||||||
|
# Out of SHITTY scope by design: parenthesized tech blocks, group name as
|
||||||
|
# the last bare word inside parens, year-suffix range in title, dual
|
||||||
|
# season expression. PATH OF PAIN handles this via LLM pre-analysis.
|
||||||
|
xfail_reason: "PoP-grade pathological franchise box-set, beyond simple-dict SHITTY"
|
||||||
|
|
||||||
# Pathological franchise box-set:
|
# Pathological franchise box-set:
|
||||||
# - Title contains year-suffix range "83-86-89" (3 years glued)
|
# - Title contains year-suffix range "83-86-89" (3 years glued)
|
||||||
# - Season range expressed twice: "Season 1-3" AND "S01-S03"
|
# - Season range expressed twice: "Season 1-3" AND "S01-S03"
|
||||||
|
|||||||
@@ -1,13 +1,15 @@
|
|||||||
release_name: "Khruangbin | Austin City Limits Music Festival 2024 | Full Set [V_-7WWPPeBs].webm"
|
release_name: "Khruangbin | Austin City Limits Music Festival 2024 | Full Set [V_-7WWPPeBs].webm"
|
||||||
|
|
||||||
# yt-dlp slug: UTF-8 wide pipe '|' (U+FF5C, not the ASCII '|'), trailing
|
# yt-dlp slug: UTF-8 wide pipe '|' (U+FF5C, not the ASCII '|'), trailing
|
||||||
# YouTube video ID in brackets, .webm extension. Parser extracts the year
|
# YouTube video ID in brackets, .webm extension. The wide pipe survives
|
||||||
# (2024) correctly but mistakes the YouTube ID '7WWPPeBs' for a release
|
# the tokenizer (not a separator) but is now dropped at title assembly
|
||||||
# group, and the wide pipe survives the tokenizer (not a separator).
|
# (pure-punctuation TITLE tokens carry no content). Year (2024) parses
|
||||||
|
# correctly; the YouTube ID '7WWPPeBs' is still mistaken for a release
|
||||||
|
# group (separate gap, see PoP backlog).
|
||||||
# This is a concert recording — closer to "live music" than "movie", but
|
# This is a concert recording — closer to "live music" than "movie", but
|
||||||
# media_type=movie is the current degenerate best guess.
|
# media_type=movie is the current degenerate best guess.
|
||||||
parsed:
|
parsed:
|
||||||
title: "Khruangbin.|.Austin.City.Limits.Music.Festival"
|
title: "Khruangbin.Austin.City.Limits.Music.Festival"
|
||||||
year: 2024
|
year: 2024
|
||||||
season: null
|
season: null
|
||||||
episode: null
|
episode: null
|
||||||
|
|||||||
+5
@@ -1,5 +1,10 @@
|
|||||||
release_name: "Predator Badlands 2025 1080p HDRip HEVC x265 BONE"
|
release_name: "Predator Badlands 2025 1080p HDRip HEVC x265 BONE"
|
||||||
|
|
||||||
|
# Space-separated release with both codec aliases present (HEVC + x265)
|
||||||
|
# and no dash-before-group. Simple-SHITTY first-wins picks HEVC, expected
|
||||||
|
# was x265 (legacy last-wins). Reclassified PoP.
|
||||||
|
xfail_reason: "Space-separated, dual codec aliases, no dashed group"
|
||||||
|
|
||||||
# Space-separated release: tokenizer correctly splits and identifies year +
|
# Space-separated release: tokenizer correctly splits and identifies year +
|
||||||
# tech, but the dash-before-group convention is absent so 'BONE' is not
|
# tech, but the dash-before-group convention is absent so 'BONE' is not
|
||||||
# recognized as the group — falls to UNKNOWN. Anti-regression baseline.
|
# recognized as the group — falls to UNKNOWN. Anti-regression baseline.
|
||||||
@@ -1,5 +1,9 @@
|
|||||||
release_name: "SLEAFORD MODS Live Glastonbury June 27th 2015-niNjHn8abyY.mp4"
|
release_name: "SLEAFORD MODS Live Glastonbury June 27th 2015-niNjHn8abyY.mp4"
|
||||||
|
|
||||||
|
# YouTube-style slug with year-prefixed video-id dash suffix. Not a scene
|
||||||
|
# release shape at all — PATH OF PAIN.
|
||||||
|
xfail_reason: "YouTube slug with year-prefixed video-id, not a scene shape"
|
||||||
|
|
||||||
# yt-dlp filename: triple space between band name and event, no canonical
|
# yt-dlp filename: triple space between band name and event, no canonical
|
||||||
# tech markers, dashed YouTube video ID glued to the year, .mp4 extension
|
# tech markers, dashed YouTube video ID glued to the year, .mp4 extension
|
||||||
# preserved in the title. Parser:
|
# preserved in the title. Parser:
|
||||||
|
|||||||
@@ -1,5 +1,10 @@
|
|||||||
release_name: "Super Mario Bros. le film [FR-EN] (2023).mkv"
|
release_name: "Super Mario Bros. le film [FR-EN] (2023).mkv"
|
||||||
|
|
||||||
|
# Bare-dashed language pair interior to the title (``[FR-EN]``) is tagged
|
||||||
|
# as group by ``_detect_group``, leaving the title fragment behind.
|
||||||
|
# Out of simple-SHITTY scope.
|
||||||
|
xfail_reason: "Interior bare-dashed language pair confuses group detection"
|
||||||
|
|
||||||
# Hybrid English/French marketing title with:
|
# Hybrid English/French marketing title with:
|
||||||
# - Trailing period after 'Bros' that is part of the title abbreviation
|
# - Trailing period after 'Bros' that is part of the title abbreviation
|
||||||
# (not a separator), but tokenizer treats it as one
|
# (not a separator), but tokenizer treats it as one
|
||||||
|
|||||||
+16
-18
@@ -1,28 +1,26 @@
|
|||||||
release_name: "The Prodigy World's on Fire 2011 Blu-ray Remux 1080i AVC DTS-HD MA 5.1 - KRaLiMaRKo.mkv"
|
release_name: "The Prodigy World's on Fire 2011 Blu-ray Remux 1080i AVC DTS-HD MA 5.1 - KRaLiMaRKo.mkv"
|
||||||
|
|
||||||
# Apocalypse case combining every horror:
|
# Apocalypse case combining every horror — partially tamed by the
|
||||||
# - Unescaped apostrophe ("World's") → forces parse_path="ai" fallback
|
# apostrophe fix. Remaining gaps (still PoP-worthy):
|
||||||
# - Spaces AND dashes used as separators inconsistently
|
# - "1080i" interlaced flag (not in quality KB)
|
||||||
# - "Blu-ray" with a dash (vs. canonical BluRay)
|
# - "Blu-ray" with a dash (vs. canonical BluRay) — recognized as source
|
||||||
# - "1080i" interlaced flag (not 1080p)
|
# but with the dash form
|
||||||
# - "DTS-HD MA 5.1" multi-word audio codec
|
# - "DTS-HD MA 5.1" multi-word audio codec — the trailing "HD" leaks
|
||||||
# - " - GROUP.mkv" trailing format (space-dash-space before group)
|
# into the group
|
||||||
# - Trailing .mkv extension survives in title
|
# - Trailing .mkv extension survives in title
|
||||||
# Result: total degeneration — UNKNOWN across the board, title=raw input.
|
# - " - GROUP" trailing format (space-dash-space before group)
|
||||||
# Once the apostrophe + multi-word-audio + 1080i are handled this fixture
|
|
||||||
# should be revisited. For now: anti-regression of the failure shape.
|
|
||||||
parsed:
|
parsed:
|
||||||
title: "The Prodigy World's on Fire 2011 Blu-ray Remux 1080i AVC DTS-HD MA 5.1 - KRaLiMaRKo.mkv"
|
title: "The.Prodigy.Worlds.on.Fire"
|
||||||
year: null
|
year: 2011
|
||||||
season: null
|
season: null
|
||||||
episode: null
|
episode: null
|
||||||
quality: null
|
quality: null
|
||||||
source: null
|
source: "Blu-ray"
|
||||||
codec: null
|
codec: "AVC"
|
||||||
group: "UNKNOWN"
|
group: "HD"
|
||||||
tech_string: ""
|
tech_string: "Blu-ray.AVC"
|
||||||
media_type: "unknown"
|
media_type: "movie"
|
||||||
parse_path: "ai"
|
parse_path: "sanitized"
|
||||||
is_season_pack: false
|
is_season_pack: false
|
||||||
|
|
||||||
tree:
|
tree:
|
||||||
|
|||||||
@@ -1,14 +1,13 @@
|
|||||||
release_name: "Archer.S14E09E10E11.1080p.WEB.h264-ETHEL"
|
release_name: "Archer.S14E09E10E11.1080p.WEB.h264-ETHEL"
|
||||||
|
|
||||||
# Tech debt: triple-episode chain (E09E10E11) — current parser captures
|
# Triple-episode chain (E09E10E11) — the parser collapses the chain to a
|
||||||
# episode=9 and episode_end=10, but E11 is lost. Anti-regression: lock in
|
# range (episode=first, episode_end=last). Intermediate values are implied.
|
||||||
# the partial behavior so any future improvement is intentional.
|
|
||||||
parsed:
|
parsed:
|
||||||
title: "Archer"
|
title: "Archer"
|
||||||
year: null
|
year: null
|
||||||
season: 14
|
season: 14
|
||||||
episode: 9
|
episode: 9
|
||||||
episode_end: 10
|
episode_end: 11
|
||||||
quality: "1080p"
|
quality: "1080p"
|
||||||
source: "WEB"
|
source: "WEB"
|
||||||
codec: "h264"
|
codec: "h264"
|
||||||
|
|||||||
+14
-13
@@ -1,21 +1,22 @@
|
|||||||
release_name: "Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265.EAC3.5.1-Amen"
|
release_name: "Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265.EAC3.5.1-Amen"
|
||||||
|
|
||||||
# Tech debt: the unescaped apostrophe in "Don't" pushes the whole release
|
# Apostrophes inside titles ("Don't", "L'avare") used to push the release
|
||||||
# through the AI fallback path (parse_path="ai") and the parse degenerates to
|
# through the AI fallback (parse_path="ai", everything UNKNOWN). They are
|
||||||
# UNKNOWN across the board. Anti-regression here — once the tokenizer learns
|
# now pre-stripped before well-formed check and tokenize, so the parse
|
||||||
# to handle apostrophes, this fixture should be revisited.
|
# completes normally — only the title text loses its apostrophe
|
||||||
|
# ("Honey.Dont").
|
||||||
parsed:
|
parsed:
|
||||||
title: "Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265.EAC3.5.1-Amen"
|
title: "Honey.Dont"
|
||||||
year: null
|
year: 2025
|
||||||
season: null
|
season: null
|
||||||
episode: null
|
episode: null
|
||||||
quality: null
|
quality: "2160p"
|
||||||
source: null
|
source: "WEBRip"
|
||||||
codec: null
|
codec: "x265"
|
||||||
group: "UNKNOWN"
|
group: "Amen"
|
||||||
tech_string: ""
|
tech_string: "2160p.WEBRip.x265"
|
||||||
media_type: "unknown"
|
media_type: "movie"
|
||||||
parse_path: "ai"
|
parse_path: "sanitized"
|
||||||
is_season_pack: false
|
is_season_pack: false
|
||||||
|
|
||||||
tree:
|
tree:
|
||||||
|
|||||||
@@ -1,7 +1,8 @@
|
|||||||
release_name: "Notre.planete.s01e01.1080p.NF.WEB-DL.DDP5.1.x264-NTb"
|
release_name: "Notre.planete.s01e01.1080p.NF.WEB-DL.DDP5.1.x264-NTb"
|
||||||
|
|
||||||
# Lowercase 's01e01' and lowercased title word ('planete') correctly parsed.
|
# Lowercase 's01e01' and lowercased title word ('planete') correctly parsed.
|
||||||
# NF (Netflix) source tag is not in the source KB — drops; WEB-DL wins.
|
# NF is the Netflix streaming distributor (separate dimension from source);
|
||||||
|
# WEB-DL is the encoding source.
|
||||||
parsed:
|
parsed:
|
||||||
title: "Notre.planete"
|
title: "Notre.planete"
|
||||||
year: null
|
year: null
|
||||||
@@ -11,6 +12,7 @@ parsed:
|
|||||||
source: "WEB-DL"
|
source: "WEB-DL"
|
||||||
codec: "x264"
|
codec: "x264"
|
||||||
group: "NTb"
|
group: "NTb"
|
||||||
|
distributor: "NF"
|
||||||
tech_string: "1080p.WEB-DL.x264"
|
tech_string: "1080p.WEB-DL.x264"
|
||||||
media_type: "tv_show"
|
media_type: "tv_show"
|
||||||
parse_path: "direct"
|
parse_path: "direct"
|
||||||
|
|||||||
+7
-7
@@ -1,22 +1,22 @@
|
|||||||
release_name: "Der.Tatortreiniger.S01-06.GERMAN.1080p.WEB.x264-WAYNE"
|
release_name: "Der.Tatortreiniger.S01-06.GERMAN.1080p.WEB.x264-WAYNE"
|
||||||
|
|
||||||
# Tech debt: range syntax 'S01-06' is not recognized as TV — falls through
|
# Range syntax 'S01-06' is now recognized as a season-range marker:
|
||||||
# to media_type=movie with the range glued onto the title. Captured here so a
|
# season=1 (first of the range), media_type=tv_complete, and the token
|
||||||
# future ranger-aware parser change is intentional.
|
# no longer leaks into the title.
|
||||||
parsed:
|
parsed:
|
||||||
title: "Der.Tatortreiniger.S01-06"
|
title: "Der.Tatortreiniger"
|
||||||
year: null
|
year: null
|
||||||
season: null
|
season: 1
|
||||||
episode: null
|
episode: null
|
||||||
quality: "1080p"
|
quality: "1080p"
|
||||||
source: "WEB"
|
source: "WEB"
|
||||||
codec: "x264"
|
codec: "x264"
|
||||||
group: "WAYNE"
|
group: "WAYNE"
|
||||||
tech_string: "1080p.WEB.x264"
|
tech_string: "1080p.WEB.x264"
|
||||||
media_type: "movie"
|
media_type: "tv_complete"
|
||||||
languages: ["GERMAN"]
|
languages: ["GERMAN"]
|
||||||
parse_path: "direct"
|
parse_path: "direct"
|
||||||
is_season_pack: false
|
is_season_pack: true
|
||||||
|
|
||||||
tree:
|
tree:
|
||||||
- "Der.Tatortreiniger.S01-06.GERMAN.1080p.WEB.x264-WAYNE/"
|
- "Der.Tatortreiniger.S01-06.GERMAN.1080p.WEB.x264-WAYNE/"
|
||||||
|
|||||||
@@ -1,11 +1,12 @@
|
|||||||
release_name: "Vinyl - 1x01 - FHD"
|
release_name: "Vinyl - 1x01 - FHD"
|
||||||
|
|
||||||
# Tech debt: surrounding ' - ' separators leave a stray '-' token attached
|
# Surrounding ' - ' separators in human-friendly release names left stray
|
||||||
# to the title ("Vinyl.-"). NxNN form correctly identifies S01E01; everything
|
# '-' tokens attached to the title. They are now dropped at assembly time
|
||||||
# tech-side empty (no quality token in KB — "FHD" not yet known). Anti-regression
|
# (pure-punctuation TITLE tokens carry no content). NxNN form correctly
|
||||||
# the current degenerate title so a future fix is intentional.
|
# identifies S01E01; tech-side stays empty (no quality token in KB — "FHD"
|
||||||
|
# not yet known).
|
||||||
parsed:
|
parsed:
|
||||||
title: "Vinyl.-"
|
title: "Vinyl"
|
||||||
year: null
|
year: null
|
||||||
season: 1
|
season: 1
|
||||||
episode: 1
|
episode: 1
|
||||||
|
|||||||
@@ -0,0 +1,155 @@
|
|||||||
|
"""Tests for :class:`FfprobeMediaProber`.
|
||||||
|
|
||||||
|
Covers the full-probe path (``probe()`` returning a ``MediaInfo``) by
|
||||||
|
patching ``subprocess.run`` at the adapter module level. The
|
||||||
|
subtitle-streams path is exercised by the subtitle domain tests via
|
||||||
|
the same adapter.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import subprocess
|
||||||
|
from unittest.mock import MagicMock, patch
|
||||||
|
|
||||||
|
from alfred.infrastructure.probe import FfprobeMediaProber
|
||||||
|
|
||||||
|
_PROBER = FfprobeMediaProber()
|
||||||
|
_PATCH_TARGET = "alfred.infrastructure.probe.ffprobe_prober.subprocess.run"
|
||||||
|
|
||||||
|
|
||||||
|
def _ffprobe_result(returncode=0, stdout="{}", stderr="") -> MagicMock:
|
||||||
|
return MagicMock(returncode=returncode, stdout=stdout, stderr=stderr)
|
||||||
|
|
||||||
|
|
||||||
|
class TestProbe:
|
||||||
|
def test_timeout_returns_none(self, tmp_path):
|
||||||
|
f = tmp_path / "x.mkv"
|
||||||
|
f.write_bytes(b"")
|
||||||
|
with patch(
|
||||||
|
_PATCH_TARGET,
|
||||||
|
side_effect=subprocess.TimeoutExpired(cmd="ffprobe", timeout=30),
|
||||||
|
):
|
||||||
|
assert _PROBER.probe(f) is None
|
||||||
|
|
||||||
|
def test_nonzero_returncode_returns_none(self, tmp_path):
|
||||||
|
f = tmp_path / "x.mkv"
|
||||||
|
f.write_bytes(b"")
|
||||||
|
with patch(
|
||||||
|
_PATCH_TARGET,
|
||||||
|
return_value=_ffprobe_result(returncode=1, stderr="not a media file"),
|
||||||
|
):
|
||||||
|
assert _PROBER.probe(f) is None
|
||||||
|
|
||||||
|
def test_invalid_json_returns_none(self, tmp_path):
|
||||||
|
f = tmp_path / "x.mkv"
|
||||||
|
f.write_bytes(b"")
|
||||||
|
with patch(
|
||||||
|
_PATCH_TARGET,
|
||||||
|
return_value=_ffprobe_result(stdout="not json {"),
|
||||||
|
):
|
||||||
|
assert _PROBER.probe(f) is None
|
||||||
|
|
||||||
|
def test_parses_format_duration_and_bitrate(self, tmp_path):
|
||||||
|
f = tmp_path / "x.mkv"
|
||||||
|
f.write_bytes(b"")
|
||||||
|
payload = {
|
||||||
|
"format": {"duration": "1234.5", "bit_rate": "5000000"},
|
||||||
|
"streams": [],
|
||||||
|
}
|
||||||
|
with patch(
|
||||||
|
_PATCH_TARGET,
|
||||||
|
return_value=_ffprobe_result(stdout=json.dumps(payload)),
|
||||||
|
):
|
||||||
|
info = _PROBER.probe(f)
|
||||||
|
assert info is not None
|
||||||
|
assert info.duration_seconds == 1234.5
|
||||||
|
assert info.bitrate_kbps == 5000 # bit_rate // 1000
|
||||||
|
|
||||||
|
def test_invalid_numeric_format_fields_skipped(self, tmp_path):
|
||||||
|
f = tmp_path / "x.mkv"
|
||||||
|
f.write_bytes(b"")
|
||||||
|
payload = {
|
||||||
|
"format": {"duration": "garbage", "bit_rate": "also-bad"},
|
||||||
|
"streams": [],
|
||||||
|
}
|
||||||
|
with patch(
|
||||||
|
_PATCH_TARGET,
|
||||||
|
return_value=_ffprobe_result(stdout=json.dumps(payload)),
|
||||||
|
):
|
||||||
|
info = _PROBER.probe(f)
|
||||||
|
assert info is not None
|
||||||
|
assert info.duration_seconds is None
|
||||||
|
assert info.bitrate_kbps is None
|
||||||
|
|
||||||
|
def test_parses_streams(self, tmp_path):
|
||||||
|
f = tmp_path / "x.mkv"
|
||||||
|
f.write_bytes(b"")
|
||||||
|
payload = {
|
||||||
|
"format": {},
|
||||||
|
"streams": [
|
||||||
|
{
|
||||||
|
"index": 0,
|
||||||
|
"codec_type": "video",
|
||||||
|
"codec_name": "h264",
|
||||||
|
"width": 1920,
|
||||||
|
"height": 1080,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"index": 1,
|
||||||
|
"codec_type": "audio",
|
||||||
|
"codec_name": "ac3",
|
||||||
|
"channels": 6,
|
||||||
|
"channel_layout": "5.1",
|
||||||
|
"tags": {"language": "eng"},
|
||||||
|
"disposition": {"default": 1},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"index": 2,
|
||||||
|
"codec_type": "audio",
|
||||||
|
"codec_name": "aac",
|
||||||
|
"channels": 2,
|
||||||
|
"tags": {"language": "fra"},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"index": 3,
|
||||||
|
"codec_type": "subtitle",
|
||||||
|
"codec_name": "subrip",
|
||||||
|
"tags": {"language": "fra"},
|
||||||
|
"disposition": {"forced": 1},
|
||||||
|
},
|
||||||
|
],
|
||||||
|
}
|
||||||
|
with patch(
|
||||||
|
_PATCH_TARGET,
|
||||||
|
return_value=_ffprobe_result(stdout=json.dumps(payload)),
|
||||||
|
):
|
||||||
|
info = _PROBER.probe(f)
|
||||||
|
assert info.video_codec == "h264"
|
||||||
|
assert info.width == 1920 and info.height == 1080
|
||||||
|
assert len(info.audio_tracks) == 2
|
||||||
|
eng = info.audio_tracks[0]
|
||||||
|
assert eng.language == "eng"
|
||||||
|
assert eng.is_default is True
|
||||||
|
assert info.audio_tracks[1].is_default is False
|
||||||
|
assert len(info.subtitle_tracks) == 1
|
||||||
|
assert info.subtitle_tracks[0].is_forced is True
|
||||||
|
|
||||||
|
def test_first_video_stream_wins(self, tmp_path):
|
||||||
|
# The implementation only fills video_codec on the FIRST video stream.
|
||||||
|
f = tmp_path / "x.mkv"
|
||||||
|
f.write_bytes(b"")
|
||||||
|
payload = {
|
||||||
|
"format": {},
|
||||||
|
"streams": [
|
||||||
|
{"codec_type": "video", "codec_name": "h264", "width": 1920},
|
||||||
|
{"codec_type": "video", "codec_name": "hevc", "width": 3840},
|
||||||
|
],
|
||||||
|
}
|
||||||
|
with patch(
|
||||||
|
_PATCH_TARGET,
|
||||||
|
return_value=_ffprobe_result(stdout=json.dumps(payload)),
|
||||||
|
):
|
||||||
|
info = _PROBER.probe(f)
|
||||||
|
assert info.video_codec == "h264"
|
||||||
|
assert info.width == 1920
|
||||||
@@ -1,21 +1,19 @@
|
|||||||
"""Tests for the smaller ``alfred.infrastructure.filesystem`` helpers.
|
"""Tests for the smaller ``alfred.infrastructure.filesystem`` helpers.
|
||||||
|
|
||||||
Covers four siblings of ``FileManager`` that had near-zero coverage:
|
Covers three siblings of ``FileManager`` that had near-zero coverage:
|
||||||
|
|
||||||
- ``ffprobe.probe`` — wraps ``ffprobe`` JSON output into a ``MediaInfo``.
|
|
||||||
- ``filesystem_operations.create_folder`` / ``move`` — thin
|
- ``filesystem_operations.create_folder`` / ``move`` — thin
|
||||||
``mkdir`` / ``mv`` wrappers returning dict-shaped responses.
|
``mkdir`` / ``mv`` wrappers returning dict-shaped responses.
|
||||||
- ``organizer.MediaOrganizer`` — computes destination paths for movies
|
- ``organizer.MediaOrganizer`` — computes destination paths for movies
|
||||||
and TV episodes; creates folders for them.
|
and TV episodes; creates folders for them.
|
||||||
- ``find_video.find_video_file`` — first-video lookup in a folder.
|
- ``find_video.find_video_file`` — first-video lookup in a folder.
|
||||||
|
|
||||||
External commands (``ffprobe`` / ``mv``) are patched via ``subprocess.run``.
|
(``ffprobe`` coverage now lives in ``test_ffprobe_prober.py`` alongside
|
||||||
|
its adapter.)
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import json
|
|
||||||
import subprocess
|
|
||||||
from unittest.mock import MagicMock, patch
|
from unittest.mock import MagicMock, patch
|
||||||
|
|
||||||
from alfred.domain.movies.entities import Movie
|
from alfred.domain.movies.entities import Movie
|
||||||
@@ -27,7 +25,6 @@ from alfred.domain.tv_shows.value_objects import (
|
|||||||
SeasonNumber,
|
SeasonNumber,
|
||||||
ShowStatus,
|
ShowStatus,
|
||||||
)
|
)
|
||||||
from alfred.infrastructure.filesystem import ffprobe
|
|
||||||
from alfred.infrastructure.filesystem.filesystem_operations import (
|
from alfred.infrastructure.filesystem.filesystem_operations import (
|
||||||
create_folder,
|
create_folder,
|
||||||
move,
|
move,
|
||||||
@@ -38,147 +35,6 @@ from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
|||||||
|
|
||||||
_KB = YamlReleaseKnowledge()
|
_KB = YamlReleaseKnowledge()
|
||||||
|
|
||||||
# --------------------------------------------------------------------------- #
|
|
||||||
# ffprobe.probe #
|
|
||||||
# --------------------------------------------------------------------------- #
|
|
||||||
|
|
||||||
|
|
||||||
def _ffprobe_result(returncode=0, stdout="{}", stderr="") -> MagicMock:
|
|
||||||
return MagicMock(returncode=returncode, stdout=stdout, stderr=stderr)
|
|
||||||
|
|
||||||
|
|
||||||
class TestFfprobe:
|
|
||||||
def test_timeout_returns_none(self, tmp_path):
|
|
||||||
f = tmp_path / "x.mkv"
|
|
||||||
f.write_bytes(b"")
|
|
||||||
with patch(
|
|
||||||
"alfred.infrastructure.filesystem.ffprobe.subprocess.run",
|
|
||||||
side_effect=subprocess.TimeoutExpired(cmd="ffprobe", timeout=30),
|
|
||||||
):
|
|
||||||
assert ffprobe.probe(f) is None
|
|
||||||
|
|
||||||
def test_nonzero_returncode_returns_none(self, tmp_path):
|
|
||||||
f = tmp_path / "x.mkv"
|
|
||||||
f.write_bytes(b"")
|
|
||||||
with patch(
|
|
||||||
"alfred.infrastructure.filesystem.ffprobe.subprocess.run",
|
|
||||||
return_value=_ffprobe_result(returncode=1, stderr="not a media file"),
|
|
||||||
):
|
|
||||||
assert ffprobe.probe(f) is None
|
|
||||||
|
|
||||||
def test_invalid_json_returns_none(self, tmp_path):
|
|
||||||
f = tmp_path / "x.mkv"
|
|
||||||
f.write_bytes(b"")
|
|
||||||
with patch(
|
|
||||||
"alfred.infrastructure.filesystem.ffprobe.subprocess.run",
|
|
||||||
return_value=_ffprobe_result(stdout="not json {"),
|
|
||||||
):
|
|
||||||
assert ffprobe.probe(f) is None
|
|
||||||
|
|
||||||
def test_parses_format_duration_and_bitrate(self, tmp_path):
|
|
||||||
f = tmp_path / "x.mkv"
|
|
||||||
f.write_bytes(b"")
|
|
||||||
payload = {
|
|
||||||
"format": {"duration": "1234.5", "bit_rate": "5000000"},
|
|
||||||
"streams": [],
|
|
||||||
}
|
|
||||||
with patch(
|
|
||||||
"alfred.infrastructure.filesystem.ffprobe.subprocess.run",
|
|
||||||
return_value=_ffprobe_result(stdout=json.dumps(payload)),
|
|
||||||
):
|
|
||||||
info = ffprobe.probe(f)
|
|
||||||
assert info is not None
|
|
||||||
assert info.duration_seconds == 1234.5
|
|
||||||
assert info.bitrate_kbps == 5000 # bit_rate // 1000
|
|
||||||
|
|
||||||
def test_invalid_numeric_format_fields_skipped(self, tmp_path):
|
|
||||||
f = tmp_path / "x.mkv"
|
|
||||||
f.write_bytes(b"")
|
|
||||||
payload = {
|
|
||||||
"format": {"duration": "garbage", "bit_rate": "also-bad"},
|
|
||||||
"streams": [],
|
|
||||||
}
|
|
||||||
with patch(
|
|
||||||
"alfred.infrastructure.filesystem.ffprobe.subprocess.run",
|
|
||||||
return_value=_ffprobe_result(stdout=json.dumps(payload)),
|
|
||||||
):
|
|
||||||
info = ffprobe.probe(f)
|
|
||||||
assert info is not None
|
|
||||||
assert info.duration_seconds is None
|
|
||||||
assert info.bitrate_kbps is None
|
|
||||||
|
|
||||||
def test_parses_streams(self, tmp_path):
|
|
||||||
f = tmp_path / "x.mkv"
|
|
||||||
f.write_bytes(b"")
|
|
||||||
payload = {
|
|
||||||
"format": {},
|
|
||||||
"streams": [
|
|
||||||
{
|
|
||||||
"index": 0,
|
|
||||||
"codec_type": "video",
|
|
||||||
"codec_name": "h264",
|
|
||||||
"width": 1920,
|
|
||||||
"height": 1080,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"index": 1,
|
|
||||||
"codec_type": "audio",
|
|
||||||
"codec_name": "ac3",
|
|
||||||
"channels": 6,
|
|
||||||
"channel_layout": "5.1",
|
|
||||||
"tags": {"language": "eng"},
|
|
||||||
"disposition": {"default": 1},
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"index": 2,
|
|
||||||
"codec_type": "audio",
|
|
||||||
"codec_name": "aac",
|
|
||||||
"channels": 2,
|
|
||||||
"tags": {"language": "fra"},
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"index": 3,
|
|
||||||
"codec_type": "subtitle",
|
|
||||||
"codec_name": "subrip",
|
|
||||||
"tags": {"language": "fra"},
|
|
||||||
"disposition": {"forced": 1},
|
|
||||||
},
|
|
||||||
],
|
|
||||||
}
|
|
||||||
with patch(
|
|
||||||
"alfred.infrastructure.filesystem.ffprobe.subprocess.run",
|
|
||||||
return_value=_ffprobe_result(stdout=json.dumps(payload)),
|
|
||||||
):
|
|
||||||
info = ffprobe.probe(f)
|
|
||||||
assert info.video_codec == "h264"
|
|
||||||
assert info.width == 1920 and info.height == 1080
|
|
||||||
assert len(info.audio_tracks) == 2
|
|
||||||
eng = info.audio_tracks[0]
|
|
||||||
assert eng.language == "eng"
|
|
||||||
assert eng.is_default is True
|
|
||||||
assert info.audio_tracks[1].is_default is False
|
|
||||||
assert len(info.subtitle_tracks) == 1
|
|
||||||
assert info.subtitle_tracks[0].is_forced is True
|
|
||||||
|
|
||||||
def test_first_video_stream_wins(self, tmp_path):
|
|
||||||
# The implementation only fills video_codec on the FIRST video stream.
|
|
||||||
f = tmp_path / "x.mkv"
|
|
||||||
f.write_bytes(b"")
|
|
||||||
payload = {
|
|
||||||
"format": {},
|
|
||||||
"streams": [
|
|
||||||
{"codec_type": "video", "codec_name": "h264", "width": 1920},
|
|
||||||
{"codec_type": "video", "codec_name": "hevc", "width": 3840},
|
|
||||||
],
|
|
||||||
}
|
|
||||||
with patch(
|
|
||||||
"alfred.infrastructure.filesystem.ffprobe.subprocess.run",
|
|
||||||
return_value=_ffprobe_result(stdout=json.dumps(payload)),
|
|
||||||
):
|
|
||||||
info = ffprobe.probe(f)
|
|
||||||
assert info.video_codec == "h264"
|
|
||||||
assert info.width == 1920
|
|
||||||
|
|
||||||
|
|
||||||
# --------------------------------------------------------------------------- #
|
# --------------------------------------------------------------------------- #
|
||||||
# filesystem_operations #
|
# filesystem_operations #
|
||||||
|
|||||||
@@ -0,0 +1,82 @@
|
|||||||
|
"""Tests for ``LanguageRegistry`` — the YAML-backed adapter for the
|
||||||
|
:class:`alfred.domain.shared.ports.LanguageRepository` port.
|
||||||
|
|
||||||
|
The port is structural (Protocol), so the assertion that the adapter
|
||||||
|
satisfies it is a static one — we exercise the public surface here and
|
||||||
|
let mypy / runtime polymorphism do the rest.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from alfred.domain.shared.ports import LanguageRepository
|
||||||
|
from alfred.domain.shared.value_objects import Language
|
||||||
|
from alfred.infrastructure.knowledge.language_registry import LanguageRegistry
|
||||||
|
|
||||||
|
|
||||||
|
def _registry() -> LanguageRepository:
|
||||||
|
"""Return a fresh registry typed as the port — proves structural fit."""
|
||||||
|
return LanguageRegistry()
|
||||||
|
|
||||||
|
|
||||||
|
class TestPortSurface:
|
||||||
|
def test_satisfies_protocol(self):
|
||||||
|
# If LanguageRegistry diverged from LanguageRepository, the annotation
|
||||||
|
# below would already be wrong at type-check time; at runtime, this
|
||||||
|
# just confirms the methods exist.
|
||||||
|
reg: LanguageRepository = LanguageRegistry()
|
||||||
|
assert hasattr(reg, "from_iso")
|
||||||
|
assert hasattr(reg, "from_any")
|
||||||
|
assert hasattr(reg, "all")
|
||||||
|
|
||||||
|
def test_len_reflects_loaded_entries(self):
|
||||||
|
reg = _registry()
|
||||||
|
# The builtin YAML ships dozens of languages — exact count drifts
|
||||||
|
# with knowledge updates, so just sanity-check it's non-empty.
|
||||||
|
assert len(reg) > 0
|
||||||
|
|
||||||
|
|
||||||
|
class TestFromIso:
|
||||||
|
def test_known_iso_returns_language(self):
|
||||||
|
reg = _registry()
|
||||||
|
fre = reg.from_iso("fre")
|
||||||
|
assert isinstance(fre, Language)
|
||||||
|
assert fre.iso == "fre"
|
||||||
|
|
||||||
|
def test_case_insensitive(self):
|
||||||
|
reg = _registry()
|
||||||
|
assert reg.from_iso("FRE") == reg.from_iso("fre")
|
||||||
|
|
||||||
|
def test_unknown_iso_returns_none(self):
|
||||||
|
assert _registry().from_iso("zzz") is None
|
||||||
|
|
||||||
|
def test_non_string_returns_none(self):
|
||||||
|
assert _registry().from_iso(None) is None # type: ignore[arg-type]
|
||||||
|
|
||||||
|
|
||||||
|
class TestFromAny:
|
||||||
|
def test_english_name(self):
|
||||||
|
reg = _registry()
|
||||||
|
lang = reg.from_any("French")
|
||||||
|
assert lang is not None
|
||||||
|
assert lang.iso == "fre"
|
||||||
|
|
||||||
|
def test_iso_639_1_alias(self):
|
||||||
|
# "fr" is the 639-1 form, registered as an alias.
|
||||||
|
reg = _registry()
|
||||||
|
lang = reg.from_any("fr")
|
||||||
|
assert lang is not None
|
||||||
|
assert lang.iso == "fre"
|
||||||
|
|
||||||
|
def test_unknown_returns_none(self):
|
||||||
|
assert _registry().from_any("vostfr") is None
|
||||||
|
|
||||||
|
def test_non_string_returns_none(self):
|
||||||
|
assert _registry().from_any(123) is None # type: ignore[arg-type]
|
||||||
|
|
||||||
|
|
||||||
|
class TestMembership:
|
||||||
|
def test_contains_known(self):
|
||||||
|
assert "english" in _registry()
|
||||||
|
|
||||||
|
def test_does_not_contain_unknown(self):
|
||||||
|
assert "klingon" not in _registry()
|
||||||
@@ -16,7 +16,7 @@ from __future__ import annotations
|
|||||||
|
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from alfred.domain.subtitles.entities import SubtitleCandidate
|
from alfred.domain.subtitles.entities import SubtitleScanResult
|
||||||
from alfred.application.subtitles.placer import PlacedTrack
|
from alfred.application.subtitles.placer import PlacedTrack
|
||||||
from alfred.domain.subtitles.value_objects import (
|
from alfred.domain.subtitles.value_objects import (
|
||||||
SubtitleFormat,
|
SubtitleFormat,
|
||||||
@@ -32,8 +32,8 @@ ENG = SubtitleLanguage(code="eng", tokens=["en"])
|
|||||||
|
|
||||||
def _track(
|
def _track(
|
||||||
lang=FRA, *, embedded: bool = False, confidence: float = 0.92
|
lang=FRA, *, embedded: bool = False, confidence: float = 0.92
|
||||||
) -> SubtitleCandidate:
|
) -> SubtitleScanResult:
|
||||||
return SubtitleCandidate(
|
return SubtitleScanResult(
|
||||||
language=lang,
|
language=lang,
|
||||||
format=SRT,
|
format=SRT,
|
||||||
subtitle_type=SubtitleType.STANDARD,
|
subtitle_type=SubtitleType.STANDARD,
|
||||||
|
|||||||
Reference in New Issue
Block a user