fix(release/parser): pre-strip apostrophes so titles like Don't parse cleanly
Apostrophes are in the forbidden-chars list, which made any release with a title like "Don't" or "L'avare" short-circuit to the AI fallback (parse_path=ai, everything UNKNOWN). They are now stripped up front from the name before the well-formed check and tokenize, so the parse completes normally. The raw name is preserved on the VO; only the title field loses its apostrophe. parse_path becomes 'sanitized' when an apostrophe was stripped, to surface that the parser cleaned something up. Fixtures updated: - shitty/honey_uhd_hdr/ — went from total UNKNOWN to a clean parse (title=Honey.Dont, year=2025, quality=2160p, source=WEBRip, codec=x265, group=Amen). - path_of_pain/the_prodigy_full_chaos/ — went from total failure to partial success (title, year, source, codec extracted). Remaining gaps (1080i, multi-word audio, Blu-ray-with-dash) are tracked separately in tech debt.
This commit is contained in:
+16
-18
@@ -1,28 +1,26 @@
|
||||
release_name: "The Prodigy World's on Fire 2011 Blu-ray Remux 1080i AVC DTS-HD MA 5.1 - KRaLiMaRKo.mkv"
|
||||
|
||||
# Apocalypse case combining every horror:
|
||||
# - Unescaped apostrophe ("World's") → forces parse_path="ai" fallback
|
||||
# - Spaces AND dashes used as separators inconsistently
|
||||
# - "Blu-ray" with a dash (vs. canonical BluRay)
|
||||
# - "1080i" interlaced flag (not 1080p)
|
||||
# - "DTS-HD MA 5.1" multi-word audio codec
|
||||
# - " - GROUP.mkv" trailing format (space-dash-space before group)
|
||||
# Apocalypse case combining every horror — partially tamed by the
|
||||
# apostrophe fix. Remaining gaps (still PoP-worthy):
|
||||
# - "1080i" interlaced flag (not in quality KB)
|
||||
# - "Blu-ray" with a dash (vs. canonical BluRay) — recognized as source
|
||||
# but with the dash form
|
||||
# - "DTS-HD MA 5.1" multi-word audio codec — the trailing "HD" leaks
|
||||
# into the group
|
||||
# - Trailing .mkv extension survives in title
|
||||
# Result: total degeneration — UNKNOWN across the board, title=raw input.
|
||||
# Once the apostrophe + multi-word-audio + 1080i are handled this fixture
|
||||
# should be revisited. For now: anti-regression of the failure shape.
|
||||
# - " - GROUP" trailing format (space-dash-space before group)
|
||||
parsed:
|
||||
title: "The Prodigy World's on Fire 2011 Blu-ray Remux 1080i AVC DTS-HD MA 5.1 - KRaLiMaRKo.mkv"
|
||||
year: null
|
||||
title: "The.Prodigy.Worlds.on.Fire"
|
||||
year: 2011
|
||||
season: null
|
||||
episode: null
|
||||
quality: null
|
||||
source: null
|
||||
codec: null
|
||||
group: "UNKNOWN"
|
||||
tech_string: ""
|
||||
media_type: "unknown"
|
||||
parse_path: "ai"
|
||||
source: "Blu-ray"
|
||||
codec: "AVC"
|
||||
group: "HD"
|
||||
tech_string: "Blu-ray.AVC"
|
||||
media_type: "movie"
|
||||
parse_path: "sanitized"
|
||||
is_season_pack: false
|
||||
|
||||
tree:
|
||||
|
||||
Reference in New Issue
Block a user