refactor(release): simplify SHITTY to dict-driven token tagging

Replace the ~480-line legacy heuristic block in services.py with a
small dict-driven pass in pipeline._annotate_shitty: each token is
looked up against the kb buckets (resolutions / sources / codecs /
distributors / year / sxxexx) with first-match-wins semantics, the
leftmost contiguous UNKNOWN run becomes the title, done.

SHITTY's scope is intentionally narrow — releases that *look* like
scene names but don't have a registered group schema. Anything more
exotic (parenthesized tech, bare-dashed title fragments, YT slugs,
franchise boxes) is PATH OF PAIN territory and stays out of here.

- annotate() no longer returns None; SHITTY is the always-on fallback
- services.py shrunk from ~525 to ~85 lines (legacy extractors gone)
- 4 fixtures get xfail markers documenting PoP-grade pathologies
  (deutschland franchise box, sleaford YT slug, super_mario bilingual,
  predator space-separators — the last one moved from shitty/ → pop/)
- ReleaseFixture grows xfail_reason; the parametrized suite wires the
  pytest.mark.xfail(strict=False) automatically
This commit is contained in:
2026-05-20 01:03:25 +02:00
parent fd3bd1ad8c
commit 3737f66851
9 changed files with 231 additions and 502 deletions
+8
View File
@@ -39,6 +39,14 @@ class ReleaseFixture:
def routing(self) -> dict:
return self.data.get("routing", {})
@property
def xfail_reason(self) -> str | None:
"""If set, the fixture is expected to fail — wrapped with
``pytest.mark.xfail`` by the test runner. Used for known
not-supported pathological cases (typically PATH OF PAIN bucket).
"""
return self.data.get("xfail_reason")
def materialize(self, root: Path) -> None:
"""Create the fixture's ``tree`` as empty files/dirs under ``root``."""
for entry in self.tree: