Commit Graph

4 Commits

Author SHA1 Message Date
francwa 3737f66851 refactor(release): simplify SHITTY to dict-driven token tagging
Replace the ~480-line legacy heuristic block in services.py with a
small dict-driven pass in pipeline._annotate_shitty: each token is
looked up against the kb buckets (resolutions / sources / codecs /
distributors / year / sxxexx) with first-match-wins semantics, the
leftmost contiguous UNKNOWN run becomes the title, done.

SHITTY's scope is intentionally narrow — releases that *look* like
scene names but don't have a registered group schema. Anything more
exotic (parenthesized tech, bare-dashed title fragments, YT slugs,
franchise boxes) is PATH OF PAIN territory and stays out of here.

- annotate() no longer returns None; SHITTY is the always-on fallback
- services.py shrunk from ~525 to ~85 lines (legacy extractors gone)
- 4 fixtures get xfail markers documenting PoP-grade pathologies
  (deutschland franchise box, sleaford YT slug, super_mario bilingual,
  predator space-separators — the last one moved from shitty/ → pop/)
- ReleaseFixture grows xfail_reason; the parametrized suite wires the
  pytest.mark.xfail(strict=False) automatically
2026-05-20 01:03:25 +02:00
francwa 6802933acd test(release): adapt suite to explicit ReleaseKnowledge injection
- test_release.py / test_release_fixtures.py: module-level
  _KB = YamlReleaseKnowledge() + thin _parse(name) helper threading it
  into parse_release. test_show_folder_name_strips_windows_chars renamed
  to test_show_folder_name_uses_already_safe_title to reflect the
  Option B contract (caller sanitizes via kb.sanitize_for_fs).
- test_detect_media_type.py: same _KB pattern, all
  detect_media_type(parsed, path) calls now pass kb.
- test_filesystem_extras.py: find_video_file(path) calls now pass kb.
- test_enrich_from_probe.py: _bare() helper adds the new
  title_sanitized field.
- test_resolve_destination.py: drop _sanitize import + TestSanitize
  class (helper deleted), add tmdb_title_safe arg to
  _resolve_series_folder calls.

987 passed, 8 skipped.
2026-05-19 22:05:26 +02:00
francwa 273510dff8 test(fixtures): seed PATH OF PAIN bucket with 10 worst-case fixtures
10 pathological release names mined from the real downloads folder.
Each fixture locks in the current parse_release output (including
its silent losses and false positives) so future parser improvements
are intentional, not silent drift.

Cases:
- Khruangbin yt-dlp slug (UTF-8 wide pipe '|', YT ID as group)
- Deutschland 83-86-89 franchise box (group=S03 misdetection)
- Chérie Le BéBé (accented chars preserved, VFF language)
- Jimmy Carr 8-word stand-up special title
- [ OxTorrent.vc ] prefix + XviD codec (site_tag prefix)
- Prodiges S12E01 with episode title + air-date silently lost
- The Prodigy: apostrophe + Blu-ray dash + 1080i + multi-word audio
  = full AI-path degeneration (everything UNKNOWN)
- Sleaford Mods yt-dlp slug (YT ID glued to year)
- Super Mario Bros [FR-EN] (bilingual tag mistaken for group)
- Gilmore Girls Complete S01-S07 (the well-behaved exception:
  COMPLETE token correctly drives tv_complete + REPACK + 10bit)

Also adds shitty + path_of_pain to the per-bucket sanity assertion.

Suite: 1020 passed, 8 skipped.
2026-05-18 15:57:56 +02:00
francwa 7bc50fd5b8 test: add real-world release fixtures (EASY bucket)
Captures 5 canonical releases from /mnt/testipool/downloads as parametrized
fixtures under tests/fixtures/releases/easy/. Each fixture declares the
release name, expected ParsedRelease fields, original tree, and the future
routing (library / torrents / seed_hardlinks) for the upcoming organize_media
refactor.

Today only the 'parsed' section is asserted; tree is materialized into a
tmp_path to catch typos. Routing is captured ahead of the planner work — it
becomes verifiable once organize_media lands.

Cases: back_in_action (movie), slow_horses_single_ep (TV single),
foundation_season_pack (S02 + .nfo noise), long_walk_with_noise (movie +
KONTRAST.TOP.txt), sinners_yts (YTS bracket-heavy + Subs/ dir).

Also tracks CHANGELOG.md under [Unreleased] / Added.
2026-05-18 15:36:19 +02:00