feat(release): add fullwidth vertical bar | (U+FF5C) to separators

CJK release names sometimes use the fullwidth vertical bar as a token
separator, as do occasional decorative YouTube-style uploads. Adding
the codepoint to separators.yaml lets the tokenizer split on it
instead of leaving the wide pipe glued onto an adjacent token.

The tokenizer in alfred/domain/release/parser/pipeline.py iterates
the separator list as plain strings (no regex), so a multi-byte
UTF-8 separator works without any code change.
This commit is contained in:
2026-05-21 08:05:56 +02:00
parent 88f156b7a4
commit 3dc73a5214
2 changed files with 10 additions and 0 deletions
+9
View File
@@ -48,6 +48,15 @@ callers).
### Added
- **Fullwidth vertical bar `` (U+FF5C) is now a recognized release-name
token separator.** Added to `alfred/knowledge/release/separators.yaml`
so CJK release names (and the occasional decorative YouTube-style use)
tokenize cleanly instead of leaving the wide pipe glued onto an
adjacent token. The tokenizer in
`alfred/domain/release/parser/pipeline.py` already iterates the
separator list as plain strings (no regex), so a multi-byte UTF-8
separator works without any code change.
- **`InspectedResult.recommended_action` property** — derived hint that
collapses the orchestrator's go / wait / skip decision into a single
value (``"process"`` / ``"ask_user"`` / ``"skip"``). Centralizes the
+1
View File
@@ -21,3 +21,4 @@ separators:
- "(" # parenthesis-embedded (year, edition): (2020) (Director's Cut)
- ")"
- "_" # underscore-as-space (old usenet, some Asian releases)
- "" # fullwidth vertical bar U+FF5C (CJK release names, occasional decorative use)