Replace the ~480-line legacy heuristic block in services.py with a
small dict-driven pass in pipeline._annotate_shitty: each token is
looked up against the kb buckets (resolutions / sources / codecs /
distributors / year / sxxexx) with first-match-wins semantics, the
leftmost contiguous UNKNOWN run becomes the title, done.
SHITTY's scope is intentionally narrow — releases that *look* like
scene names but don't have a registered group schema. Anything more
exotic (parenthesized tech, bare-dashed title fragments, YT slugs,
franchise boxes) is PATH OF PAIN territory and stays out of here.
- annotate() no longer returns None; SHITTY is the always-on fallback
- services.py shrunk from ~525 to ~85 lines (legacy extractors gone)
- 4 fixtures get xfail markers documenting PoP-grade pathologies
(deutschland franchise box, sleaford YT slug, super_mario bilingual,
predator space-separators — the last one moved from shitty/ → pop/)
- ReleaseFixture grows xfail_reason; the parametrized suite wires the
pytest.mark.xfail(strict=False) automatically
- test_release.py / test_release_fixtures.py: module-level
_KB = YamlReleaseKnowledge() + thin _parse(name) helper threading it
into parse_release. test_show_folder_name_strips_windows_chars renamed
to test_show_folder_name_uses_already_safe_title to reflect the
Option B contract (caller sanitizes via kb.sanitize_for_fs).
- test_detect_media_type.py: same _KB pattern, all
detect_media_type(parsed, path) calls now pass kb.
- test_filesystem_extras.py: find_video_file(path) calls now pass kb.
- test_enrich_from_probe.py: _bare() helper adds the new
title_sanitized field.
- test_resolve_destination.py: drop _sanitize import + TestSanitize
class (helper deleted), add tmdb_title_safe arg to
_resolve_series_folder calls.
987 passed, 8 skipped.
10 pathological release names mined from the real downloads folder.
Each fixture locks in the current parse_release output (including
its silent losses and false positives) so future parser improvements
are intentional, not silent drift.
Cases:
- Khruangbin yt-dlp slug (UTF-8 wide pipe '|', YT ID as group)
- Deutschland 83-86-89 franchise box (group=S03 misdetection)
- Chérie Le BéBé (accented chars preserved, VFF language)
- Jimmy Carr 8-word stand-up special title
- [ OxTorrent.vc ] prefix + XviD codec (site_tag prefix)
- Prodiges S12E01 with episode title + air-date silently lost
- The Prodigy: apostrophe + Blu-ray dash + 1080i + multi-word audio
= full AI-path degeneration (everything UNKNOWN)
- Sleaford Mods yt-dlp slug (YT ID glued to year)
- Super Mario Bros [FR-EN] (bilingual tag mistaken for group)
- Gilmore Girls Complete S01-S07 (the well-behaved exception:
COMPLETE token correctly drives tv_complete + REPACK + 10bit)
Also adds shitty + path_of_pain to the per-bucket sanity assertion.
Suite: 1020 passed, 8 skipped.
Captures 5 canonical releases from /mnt/testipool/downloads as parametrized
fixtures under tests/fixtures/releases/easy/. Each fixture declares the
release name, expected ParsedRelease fields, original tree, and the future
routing (library / torrents / seed_hardlinks) for the upcoming organize_media
refactor.
Today only the 'parsed' section is asserted; tree is materialized into a
tmp_path to catch typos. Routing is captured ahead of the planner work — it
becomes verifiable once organize_media lands.
Cases: back_in_action (movie), slow_horses_single_ep (TV single),
foundation_season_pack (S02 + .nfo noise), long_walk_with_noise (movie +
KONTRAST.TOP.txt), sinners_yts (YTS bracket-heavy + Subs/ dir).
Also tracks CHANGELOG.md under [Unreleased] / Added.