feat(release): add fullwidth vertical bar | (U+FF5C) to separators
CJK release names sometimes use the fullwidth vertical bar as a token separator, as do occasional decorative YouTube-style uploads. Adding the codepoint to separators.yaml lets the tokenizer split on it instead of leaving the wide pipe glued onto an adjacent token. The tokenizer in alfred/domain/release/parser/pipeline.py iterates the separator list as plain strings (no regex), so a multi-byte UTF-8 separator works without any code change.
This commit is contained in:
@@ -48,6 +48,15 @@ callers).
|
||||
|
||||
### Added
|
||||
|
||||
- **Fullwidth vertical bar `|` (U+FF5C) is now a recognized release-name
|
||||
token separator.** Added to `alfred/knowledge/release/separators.yaml`
|
||||
so CJK release names (and the occasional decorative YouTube-style use)
|
||||
tokenize cleanly instead of leaving the wide pipe glued onto an
|
||||
adjacent token. The tokenizer in
|
||||
`alfred/domain/release/parser/pipeline.py` already iterates the
|
||||
separator list as plain strings (no regex), so a multi-byte UTF-8
|
||||
separator works without any code change.
|
||||
|
||||
- **`InspectedResult.recommended_action` property** — derived hint that
|
||||
collapses the orchestrator's go / wait / skip decision into a single
|
||||
value (``"process"`` / ``"ask_user"`` / ``"skip"``). Centralizes the
|
||||
|
||||
@@ -21,3 +21,4 @@ separators:
|
||||
- "(" # parenthesis-embedded (year, edition): (2020) (Director's Cut)
|
||||
- ")"
|
||||
- "_" # underscore-as-space (old usenet, some Asian releases)
|
||||
- "|" # fullwidth vertical bar U+FF5C (CJK release names, occasional decorative use)
|
||||
|
||||
Reference in New Issue
Block a user