fix(release/parser): drop pure-punctuation TITLE tokens at assembly

Releases using ' - ' as a separator (Vinyl - 1x01 - FHD) tokenize to
['Vinyl', '-', '1x01', '-', 'FHD'] — the standalone '-' tokens were
ending up in title_parts and leaked into the joined title
('Vinyl.-'). We can't add '-' to the separator list (it would break
codec-GROUP), so we filter at assembly: a TITLE token with no
alphanumeric characters carries no title content.

Side win: same logic eliminates the UTF-8 wide-pipe '|' from the
khruangbin_yt_wide_pipe fixture title.

Fixtures updated:
- shitty/vinyl_1x01_format/expected.yaml (title: Vinyl.- → Vinyl)
- path_of_pain/khruangbin_yt_wide_pipe/expected.yaml (| dropped)
This commit is contained in:
2026-05-20 23:24:40 +02:00
parent 5bbdc9081f
commit b1c7f35ffb
4 changed files with 26 additions and 10 deletions
@@ -1,11 +1,12 @@
release_name: "Vinyl - 1x01 - FHD"
# Tech debt: surrounding ' - ' separators leave a stray '-' token attached
# to the title ("Vinyl.-"). NxNN form correctly identifies S01E01; everything
# tech-side empty (no quality token in KB — "FHD" not yet known). Anti-regression
# the current degenerate title so a future fix is intentional.
# Surrounding ' - ' separators in human-friendly release names left stray
# '-' tokens attached to the title. They are now dropped at assembly time
# (pure-punctuation TITLE tokens carry no content). NxNN form correctly
# identifies S01E01; tech-side stays empty (no quality token in KB — "FHD"
# not yet known).
parsed:
title: "Vinyl.-"
title: "Vinyl"
year: null
season: 1
episode: 1