feat(release): wire v2 EASY path for known release groups

The annotate-based v2 pipeline now handles releases ending in -KONTRAST,
-ELiTE, or -RARBG. Unknown groups still fall through to the legacy
SHITTY heuristic in services.py — nothing changes for them.

Pipeline (alfred/domain/release/parser/pipeline.py):
- tokenize(): string-ops separator split, strips [site.tag] first.
- annotate(): right-to-left group detection (priority to codec-GROUP
  shape, fallback to any non-source dashed token), GroupSchema lookup
  via the kb port, then lockstep walk of tokens against schema chunks.
  Optional chunks skip on mismatch, mandatory mismatches return None so
  the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP
  trailing token correctly skips the CODEC chunk in the body walk.
- assemble(): folds annotated tokens into a ParsedRelease-compatible
  dict (title joined by '.', group from the codec-GROUP token's extras).

Schema (alfred/domain/release/parser/schema.py):
- GroupSchema + SchemaChunk frozen value objects.
- TokenRole.GROUP added.

Port + adapter:
- ReleaseKnowledge.group_schema(name) lookup added (case-insensitive).
- YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/
  *.yaml at construction time; learned overrides in
  data/knowledge/release/release_groups/ also picked up.

Knowledge:
- release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the
  canonical chunk_order. ELiTE marks source as optional (Foundation.S02
  has no WEBRip token).

Services:
- parse_release tries the v2 path first; on None falls through to the
  legacy implementation untouched.

Tests:
- tests/domain/release/test_parser_v2_easy.py (10 cases) cover group
  detection (codec-GROUP, dashed-source skip, no-dash → unknown),
  schema-driven annotation (movie, TV episode, season pack with
  optional source, unknown group returns None), and field assembly.
- Existing tests/domain/test_release_fixtures.py (30 cases) stay green:
  5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures
  still produced by the legacy path. Verified via spy on v2.assemble.

Suite: 1007 passed, 8 skipped.

Refs: project_release_parser_v2_specs (memory)
This commit is contained in:
2026-05-20 00:21:11 +02:00
parent a2c917618f
commit 075a827b0e
12 changed files with 730 additions and 70 deletions
+25 -9
View File
@@ -17,15 +17,31 @@ callers).
### Added
- **Release parser v2 scaffolding** (`alfred/domain/release/parser/`):
new package laying the foundation for an annotate-based pipeline
(tokenize → annotate → assemble). Exposes `Token` (frozen VO with
`index` + `role` + `extra`), `TokenRole` enum (structural / technical /
meta families), and a `pipeline.py` module with working `strip_site_tag`
+ `tokenize` and a documented `annotate` stub. Legacy `parse_release`
in `release.services` remains the live implementation until the
annotate step is wired in. Scaffolding tests in
`tests/domain/release/test_parser_v2_scaffolding.py`.
- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
new annotate-based pipeline (tokenize → annotate → assemble) drives
releases from known groups. Exposes `Token` (frozen VO with `index` +
`role` + `extra`), `TokenRole` enum (structural/technical/meta families),
and `GroupSchema` / `SchemaChunk` value objects.
- `pipeline.tokenize`: string-ops separator split (no regex), strips
a `[site.tag]` prefix/suffix first.
- `pipeline.annotate`: detects the trailing group right-to-left
(priority to `codec-GROUP` shape, fallback to any non-source dashed
token), looks up its `GroupSchema`, then walks tokens and schema
chunks in lockstep — optional chunks that don't match are skipped,
mandatory mismatches abort EASY and return `None` so the caller can
fall back to SHITTY.
- `pipeline.assemble`: folds annotated tokens into a
`ParsedRelease`-compatible dict.
- `parse_release` (in `release.services`) tries the v2 EASY path first
and falls through to the legacy SHITTY heuristic on `None`. Legacy
SHITTY/PATH OF PAIN behavior is unchanged.
- Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
rarbg}.yaml` declare the canonical chunk order per group, loaded via
new `ReleaseKnowledge.group_schema(name)` port method.
- Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
cover token VOs, site-tag stripping, group detection, schema-driven
annotation (movie, TV episode, season pack with optional source),
and field assembly.
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
each documenting an expected `ParsedRelease` plus the future `routing`