fix(release/parser): pre-strip apostrophes so titles like Don't parse cleanly
Apostrophes are in the forbidden-chars list, which made any release with a title like "Don't" or "L'avare" short-circuit to the AI fallback (parse_path=ai, everything UNKNOWN). They are now stripped up front from the name before the well-formed check and tokenize, so the parse completes normally. The raw name is preserved on the VO; only the title field loses its apostrophe. parse_path becomes 'sanitized' when an apostrophe was stripped, to surface that the parser cleaned something up. Fixtures updated: - shitty/honey_uhd_hdr/ — went from total UNKNOWN to a clean parse (title=Honey.Dont, year=2025, quality=2160p, source=WEBRip, codec=x265, group=Amen). - path_of_pain/the_prodigy_full_chaos/ — went from total failure to partial success (title, year, source, codec extracted). Remaining gaps (1080i, multi-word audio, Blu-ray-with-dash) are tracked separately in tech debt.
This commit is contained in:
@@ -23,6 +23,16 @@ callers).
|
||||
with intermediate values implied. Fixture
|
||||
`shitty/archer_multi_episode/` updated from anti-regression-of-bug
|
||||
to anti-regression-of-fix.
|
||||
- **Apostrophes in titles no longer push the release through the AI
|
||||
fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen`
|
||||
previously parsed with `parse_path="ai"` and everything UNKNOWN
|
||||
because `'` is in the forbidden-chars list. Apostrophes are now
|
||||
pre-stripped before the well-formed check, so the parse completes
|
||||
normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only
|
||||
the title text loses its apostrophe. `parse_path` becomes
|
||||
`sanitized` to surface the cleanup. Side win: PoP fixture
|
||||
`the_prodigy_full_chaos/` also moves from total failure to a
|
||||
partially-correct parse (year, source, codec extracted).
|
||||
- **Season-range markers (`Sxx-yy`) are now recognized as
|
||||
`tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously
|
||||
parsed as `media_type=movie` with `S01-06` glued onto the title.
|
||||
|
||||
Reference in New Issue
Block a user