refactor(release): simplify SHITTY to dict-driven token tagging
Replace the ~480-line legacy heuristic block in services.py with a small dict-driven pass in pipeline._annotate_shitty: each token is looked up against the kb buckets (resolutions / sources / codecs / distributors / year / sxxexx) with first-match-wins semantics, the leftmost contiguous UNKNOWN run becomes the title, done. SHITTY's scope is intentionally narrow — releases that *look* like scene names but don't have a registered group schema. Anything more exotic (parenthesized tech, bare-dashed title fragments, YT slugs, franchise boxes) is PATH OF PAIN territory and stays out of here. - annotate() no longer returns None; SHITTY is the always-on fallback - services.py shrunk from ~525 to ~85 lines (legacy extractors gone) - 4 fixtures get xfail markers documenting PoP-grade pathologies (deutschland franchise box, sleaford YT slug, super_mario bilingual, predator space-separators — the last one moved from shitty/ → pop/) - ReleaseFixture grows xfail_reason; the parametrized suite wires the pytest.mark.xfail(strict=False) automatically
This commit is contained in:
@@ -1,5 +1,10 @@
|
||||
release_name: "Deutschland 83-86-89 (2015) Season 1-3 S01-S03 (1080p BluRay x265 HEVC 10bit AAC 5.1 German Kappa)"
|
||||
|
||||
# Out of SHITTY scope by design: parenthesized tech blocks, group name as
|
||||
# the last bare word inside parens, year-suffix range in title, dual
|
||||
# season expression. PATH OF PAIN handles this via LLM pre-analysis.
|
||||
xfail_reason: "PoP-grade pathological franchise box-set, beyond simple-dict SHITTY"
|
||||
|
||||
# Pathological franchise box-set:
|
||||
# - Title contains year-suffix range "83-86-89" (3 years glued)
|
||||
# - Season range expressed twice: "Season 1-3" AND "S01-S03"
|
||||
|
||||
Reference in New Issue
Block a user