Le champ + son validator étaient orphelins depuis la suppression
de MovieService.validate_movie_file. L'exclusion par extension
(application/release/supported_media.py) + le PoP couvrent désormais
la règle 'vrai film vs sample'. Si on a un jour besoin d'un seuil de
taille, il ira dans data/knowledge/, pas dans settings.
Six small files (audio, video, subtitle, info, matching, tracks_mixin
+ __init__) collapsed into one ~250 LoC media.py module. Python treats
media.py and media/__init__.py interchangeably, so the 12 import sites
that read 'from alfred.domain.shared.media import ...' continue to work
without changes.
Reasoning: the whole bounded context fits on one screen; splitting into
sub-modules added more navigation friction than it saved. Tests stay
green (1077 passed).
Apostrophes are in the forbidden-chars list, which made any release
with a title like "Don't" or "L'avare" short-circuit to the AI
fallback (parse_path=ai, everything UNKNOWN). They are now stripped
up front from the name before the well-formed check and tokenize,
so the parse completes normally. The raw name is preserved on the
VO; only the title field loses its apostrophe.
parse_path becomes 'sanitized' when an apostrophe was stripped, to
surface that the parser cleaned something up.
Fixtures updated:
- shitty/honey_uhd_hdr/ — went from total UNKNOWN to a clean parse
(title=Honey.Dont, year=2025, quality=2160p, source=WEBRip,
codec=x265, group=Amen).
- path_of_pain/the_prodigy_full_chaos/ — went from total failure to
partial success (title, year, source, codec extracted). Remaining
gaps (1080i, multi-word audio, Blu-ray-with-dash) are tracked
separately in tech debt.
`Der.Tatortreiniger.S01-06.GERMAN...` previously parsed as a movie
with 'S01-06' glued to the title. The parser now matches the
season-range form in _parse_season_episode (returning season=first,
episode=None), and the assemble step detects the range token to
promote media_type to 'tv_complete'.
The first season is exposed as `season` so `is_season_pack`
fires (season is not None and episode is None) — useful for routing
to a series root folder.
Fixture shitty/tatortreiniger_flat_multiseason/ updated:
- title: Der.Tatortreiniger.S01-06 → Der.Tatortreiniger
- season: null → 1
- media_type: movie → tv_complete
- is_season_pack: false → true
Releases using ' - ' as a separator (Vinyl - 1x01 - FHD) tokenize to
['Vinyl', '-', '1x01', '-', 'FHD'] — the standalone '-' tokens were
ending up in title_parts and leaked into the joined title
('Vinyl.-'). We can't add '-' to the separator list (it would break
codec-GROUP), so we filter at assembly: a TITLE token with no
alphanumeric characters carries no title content.
Side win: same logic eliminates the UTF-8 wide-pipe '|' from the
khruangbin_yt_wide_pipe fixture title.
Fixtures updated:
- shitty/vinyl_1x01_format/expected.yaml (title: Vinyl.- → Vinyl)
- path_of_pain/khruangbin_yt_wide_pipe/expected.yaml (| dropped)
S14E09E10E11 previously parsed to episode=9, episode_end=10 — E11
was silently dropped. The parser now takes episodes[-1] as
episode_end so the full chain is captured (episode=9, episode_end=11).
Intermediate values stay implied.
Fixture shitty/archer_multi_episode/ updated from anti-regression of
the bug to anti-regression of the fix.
Mirror the MediaProber / FilesystemScanner pattern for language lookup:
- New Protocol `LanguageRepository` in alfred.domain.shared.ports
covering from_iso, from_any, all, __contains__, __len__ — the
surface previously coupled to the concrete LanguageRegistry.
- SubtitleKnowledgeBase types its `language_registry` parameter
against the Protocol; the concrete LanguageRegistry stays in
infrastructure as the YAML-backed adapter and remains the default
when no repository is injected.
- New unit tests in tests/infrastructure/test_language_registry.py
cover the adapter surface (from_iso, from_any, membership,
case-insensitivity, non-string inputs).
Behaviour is unchanged for existing callers. The split opens the
door to in-memory fakes in future tests without loading the full
ISO 639 YAML.
Both helpers are inspection-pipeline pieces, not filesystem use cases —
they belong next to inspect_release, not next to move_media /
resolve_destination / list_folder.
The move also kills the lazy import that was hiding inside
_resolve_parsed: alfred.application.filesystem.resolve_destination
no longer triggers a cycle through alfred.application.filesystem
__init__ when loading inspect_release. Top-level import restored.
Call sites updated: inspect.py, test_detect_media_type.py,
test_enrich_from_probe.py, testing/recognize_folders_in_downloads.py.
Module docstrings + test-file docstrings updated to match the new
location.
The four resolve_*_destination use cases now route through a private
_resolve_parsed helper that picks the right entry point:
- source path provided AND it exists -> inspect_release(name, path)
runs the full pipeline (parse + media-type refinement + probe
+ enrich), so missing tech tokens (quality, codec, ...) get
filled by ffprobe and the refreshed tech_string lands in the
destination folder / file names.
- source path missing or absent -> parse_release(name) only,
same behavior as before. Back-compat: tests using fake /dl/*.mkv
paths still pass unchanged.
resolve_episode_destination / resolve_movie_destination reuse their
existing source_file parameter as the inspection target. The two
folder-move use cases (season / series) gain a new OPTIONAL
source_path parameter — threaded through the agent tool wrappers
and documented in the YAML specs.
The lazy import inside _resolve_parsed avoids a circular import:
inspect_release imports detect_media_type / enrich_from_probe from
the same application.filesystem package whose __init__ re-exports
resolve_destination.
Three new tests in TestProbeEnrichmentWiring with a stub MediaProber
prove the wiring: movie picks up probe quality, season picks it up
via source_path, and a missing path correctly skips probe (back-compat
guard).
enrich_from_probe fills None fields on ParsedRelease (quality, source,
codec, audio_*, languages) but left tech_string at its parser-time
value — so the filename builders (movie_folder_name, episode_filename,
…) saw stale tech tokens even after a successful probe.
Re-derive tech_string the same way the parser does — quality.source.codec
joined by dots, skipping None — at the end of enrich_from_probe. Token-
level values still win because enrich only fills None fields.
Four new tests in TestTechString cover: enrichment rebuilds it,
existing source survives, no-info input leaves it untouched, fully
empty parsed produces ''.
New application-layer entry point that composes the four inspection
layers in one call:
1. parse_release(name, kb) -> (ParsedRelease, ParseReport)
2. detect_media_type(parsed, path, kb) -> patch parsed.media_type
3. find_main_video(path, kb) -> Path | None (top-level scan)
4. prober.probe(video) + enrich -> when video exists and
media_type not in
{unknown, other}
Returns a frozen InspectedResult(parsed, report, source_path,
main_video, media_info, probe_used). kb and prober are injected — no
module-level singletons in inspect.py.
analyze_release tool now delegates to inspect_release; its output
gains two fields, confidence (0-100) and road (easy/shitty/path_of_pain),
surfaced from ParseReport so the LLM can route by confidence. Spec
updated to document them.
12 new tests covering happy paths, probe gating (no video, media_type
'other', probe failure), mutation contract (detect refining
parsed.media_type, enrich filling None fields), resilience
(nonexistent path), and frozen contract. Suite: 1058 passing.
Add probe(video) -> MediaInfo | None to the MediaProber Protocol and
implement it on FfprobeMediaProber. The standalone
alfred/infrastructure/filesystem/ffprobe.py module is removed; all
callers (analyze_release / probe_media tools, testing scripts) now go
through the adapter.
Tests for the probe path moved to tests/infrastructure/test_ffprobe_prober.py
(patching subprocess.run at the adapter module level).
Unblocks the upcoming inspect_release orchestrator, which needs the
port — not a free function — to compose parse + main-video selection
+ probe in one shot.
Add the application-layer helpers that decide which files are worth
parsing, sitting one notch above parse_release.
- is_supported_video(path, kb): extension-only check against
kb.video_extensions. Lowercased suffix lookup. Directories and
broken symlinks return False.
- find_main_video(folder, kb): top-level scan only (no recursion into
subdirectories — releases that wrap their video in Sample/ are
PATH_OF_PAIN territory). Lexicographically-first eligible file wins
when several qualify (deterministic, no size-based ranking). A bare
file as folder argument is supported for single-file releases.
No size threshold and no filename heuristics ('sample' / 'trailer'):
the parser's job is to extract structure, not to second-guess
non-standard release shapes. PoP catches the rest.
17 tests under tests/application/test_supported_media.py.
The EASY pipeline now extracts the full ParsedRelease surface from
known-group releases, not just the structural backbone. Behavior is
unchanged for releases that don't carry these tokens.
Pipeline (parser/pipeline.py):
- Structural walk (renamed _annotate_structural): no longer requires
body to be fully consumed. Tokens passed over between schema chunks
remain UNKNOWN so the enricher pass can claim them.
- _find_chunk(): scans forward in the body for the next token matching
a given role, skipping already-annotated tokens. Lets optional and
mandatory chunks both tolerate intercalated enricher tokens.
- _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens
and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION /
LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta /
kb.editions are matched first (longest-first ordering preserved from
the YAML), single tokens after.
- _apply_sequences(): mutates the token list, tagging the first token
of a matched sequence with extra['sequence']=<canonical value> and
trailing members with extra['sequence_member']='True' so assemble
skips them.
- _detect_channel_pairs(): handles the '5.1' / '7.1' case where the
'.' separator splits the layout into two tokens. Strips a trailing
'-GROUP' suffix on the second before joining.
Assemble:
- New fields populated: languages (list), audio_codec, audio_channels,
bit_depth, hdr_format, edition. Each role-handler skips
sequence_member tokens.
- media_type heuristic extended: edition in {COMPLETE, INTEGRALE,
COLLECTION} + no season → tv_complete (mirrors legacy).
Tests:
- 4 new TestEnrichers cases covering bit_depth+audio_codec+channels,
HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language
with DTS-HD.MA sequence, TV episode with single language.
- All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed,
8 skipped.
Refs: project_release_parser_v2_specs (memory)
The annotate-based v2 pipeline now handles releases ending in -KONTRAST,
-ELiTE, or -RARBG. Unknown groups still fall through to the legacy
SHITTY heuristic in services.py — nothing changes for them.
Pipeline (alfred/domain/release/parser/pipeline.py):
- tokenize(): string-ops separator split, strips [site.tag] first.
- annotate(): right-to-left group detection (priority to codec-GROUP
shape, fallback to any non-source dashed token), GroupSchema lookup
via the kb port, then lockstep walk of tokens against schema chunks.
Optional chunks skip on mismatch, mandatory mismatches return None so
the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP
trailing token correctly skips the CODEC chunk in the body walk.
- assemble(): folds annotated tokens into a ParsedRelease-compatible
dict (title joined by '.', group from the codec-GROUP token's extras).
Schema (alfred/domain/release/parser/schema.py):
- GroupSchema + SchemaChunk frozen value objects.
- TokenRole.GROUP added.
Port + adapter:
- ReleaseKnowledge.group_schema(name) lookup added (case-insensitive).
- YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/
*.yaml at construction time; learned overrides in
data/knowledge/release/release_groups/ also picked up.
Knowledge:
- release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the
canonical chunk_order. ELiTE marks source as optional (Foundation.S02
has no WEBRip token).
Services:
- parse_release tries the v2 path first; on None falls through to the
legacy implementation untouched.
Tests:
- tests/domain/release/test_parser_v2_easy.py (10 cases) cover group
detection (codec-GROUP, dashed-source skip, no-dash → unknown),
schema-driven annotation (movie, TV episode, season pack with
optional source, unknown group returns None), and field assembly.
- Existing tests/domain/test_release_fixtures.py (30 cases) stay green:
5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures
still produced by the legacy path. Verified via spy on v2.assemble.
Suite: 1007 passed, 8 skipped.
Refs: project_release_parser_v2_specs (memory)
New package alfred/domain/release/parser/ lays the foundation for the
release parser refactor (specs in memory). Exposes:
- Token: frozen VO carrying text + stream index + TokenRole + extra dict.
with_role() returns a new instance (no mutation).
- TokenRole: str-backed enum split into structural (TITLE/YEAR/SEASON_EP/
GROUP), technical (RESOLUTION/SOURCE/CODEC/AUDIO_*/BIT_DEPTH/HDR/
EDITION/LANGUAGE), and meta (SITE_TAG/UNKNOWN) families.
- pipeline.strip_site_tag(): pulls a [site.tag] prefix or suffix.
- pipeline.tokenize(): release name -> list[Token] (all UNKNOWN),
string-ops split on kb.separators (no regex, per CLAUDE.md).
- pipeline.annotate(): documented stub. Walk order recorded in docstring
(group right-to-left, then season/episode, year, tech, title).
Legacy parse_release in release.services remains the live implementation
until the annotate step lands. Scaffolding tests verify Token API,
site-tag stripping (prefix/suffix), and tokenize output shape.
Refs: project_release_parser_v2_specs (memory)
Domain services (SubtitleIdentifier, PatternDetector) used to import the
concrete SubtitleKnowledgeBase class directly from infrastructure for
their type hint. With this commit they depend on a structural Protocol
in alfred/domain/subtitles/ports/knowledge.py declaring just the 7
read-only query methods the domain actually consumes.
The concrete YAML-backed SubtitleKnowledgeBase in infrastructure remains
the sole adapter — no rename, no shim. With this change
alfred/domain/subtitles/ has zero imports from alfred/infrastructure/.
Also extend the changelog entry covering the full domain-io-extraction
branch.
Low-risk cleanup items, no functional change to the parser. The
philosophy remains: keep the parser simple, the AI handles edge cases.
- Extract duplicated 'fs-safe title → dot-folder-name' regex into
to_dot_folder_name() in domain/shared/value_objects.py. Used by both
MovieTitle.normalized() and TVShow.get_folder_name() (item #5).
- ParsedRelease.languages now uses field(default_factory=list) instead
of a manual __post_init__ assigning [] via object.__setattr__ (#6).
- tv_shows/entities.py module docstring: prepend ASCII ownership tree
for quicker visual scan of the aggregate hierarchy (#7).
- file_extensions.yaml: split subtitle sidecars (.srt/.sub/.idx/.ass/.ssa)
into a dedicated 'subtitle:' category instead of lumping them under
'metadata:'. _METADATA_EXTENSIONS at the value_objects.py level remains
the union of both — detect_media_type behavior unchanged. New loader
load_subtitle_extensions() exposes the distinct subtitle set for future
callers in the subtitles domain (#20).
Suite: 1020 passed, 8 skipped.
10 pathological release names mined from the real downloads folder.
Each fixture locks in the current parse_release output (including
its silent losses and false positives) so future parser improvements
are intentional, not silent drift.
Cases:
- Khruangbin yt-dlp slug (UTF-8 wide pipe '|', YT ID as group)
- Deutschland 83-86-89 franchise box (group=S03 misdetection)
- Chérie Le BéBé (accented chars preserved, VFF language)
- Jimmy Carr 8-word stand-up special title
- [ OxTorrent.vc ] prefix + XviD codec (site_tag prefix)
- Prodiges S12E01 with episode title + air-date silently lost
- The Prodigy: apostrophe + Blu-ray dash + 1080i + multi-word audio
= full AI-path degeneration (everything UNKNOWN)
- Sleaford Mods yt-dlp slug (YT ID glued to year)
- Super Mario Bros [FR-EN] (bilingual tag mistaken for group)
- Gilmore Girls Complete S01-S07 (the well-behaved exception:
COMPLETE token correctly drives tv_complete + REPACK + 10bit)
Also adds shitty + path_of_pain to the per-bucket sanity assertion.
Suite: 1020 passed, 8 skipped.
Add 15 expected.yaml fixtures under tests/fixtures/releases/shitty/
covering the awkward but real-world release names from the downloads
folder. Each fixture locks in the current parse_release behavior so
future parser changes are intentional, not silent drift.
Cases captured:
- Angel INTEGRALE 3-level hierarchy (tv_complete media_type)
- Buffy custom French title with dots preserved
- Archer S14E09E10E11 multi-episode (E11 lost — tech debt)
- Notre Planète lowercase s01e01
- Vinyl ' - 1x01 - FHD' (stray dash artifact — tech debt)
- Deutschland.83 (year-suffix as part of title)
- Tatortreiniger S01-06 range (falls to movie — tech debt)
- Derry Girls duplicated title
- Jurassic Park bare folder (media_type=unknown)
- La Nuit au Musée bilingual MULTI
- Chérie j'ai agrandi (ASCII-stripped apostrophe, parses fine)
- Honey Don't (unescaped apostrophe — full AI-path degeneration)
- Hook MULTi.SUBS movie with Subs/ folder
- Predator Badlands space separators (group=UNKNOWN — tech debt)
- Westworld S04 Subs.Only (no video file)
Each fixture also captures the future 3-flow routing (library /
torrents / seed_hardlinks) ahead of the organize_media refactor.
Suite: 1011 passed, 8 skipped.
Captures 5 canonical releases from /mnt/testipool/downloads as parametrized
fixtures under tests/fixtures/releases/easy/. Each fixture declares the
release name, expected ParsedRelease fields, original tree, and the future
routing (library / torrents / seed_hardlinks) for the upcoming organize_media
refactor.
Today only the 'parsed' section is asserted; tree is materialized into a
tmp_path to catch typos. Routing is captured ahead of the planner work — it
becomes verifiable once organize_media lands.
Cases: back_in_action (movie), slow_horses_single_ep (TV single),
foundation_season_pack (S02 + .nfo noise), long_walk_with_noise (movie +
KONTRAST.TOP.txt), sinners_yts (YTS bracket-heavy + Subs/ dir).
Also tracks CHANGELOG.md under [Unreleased] / Added.