From 3dc73a5214b11ce4b2e2abe99a0a628647e06a00 Mon Sep 17 00:00:00 2001 From: Francwa Date: Thu, 21 May 2026 08:05:56 +0200 Subject: [PATCH] =?UTF-8?q?feat(release):=20add=20fullwidth=20vertical=20b?= =?UTF-8?q?ar=20=EF=BD=9C=20(U+FF5C)=20to=20separators?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CJK release names sometimes use the fullwidth vertical bar as a token separator, as do occasional decorative YouTube-style uploads. Adding the codepoint to separators.yaml lets the tokenizer split on it instead of leaving the wide pipe glued onto an adjacent token. The tokenizer in alfred/domain/release/parser/pipeline.py iterates the separator list as plain strings (no regex), so a multi-byte UTF-8 separator works without any code change. --- CHANGELOG.md | 9 +++++++++ alfred/knowledge/release/separators.yaml | 1 + 2 files changed, 10 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index a769da9..a357e05 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -48,6 +48,15 @@ callers). ### Added +- **Fullwidth vertical bar `|` (U+FF5C) is now a recognized release-name + token separator.** Added to `alfred/knowledge/release/separators.yaml` + so CJK release names (and the occasional decorative YouTube-style use) + tokenize cleanly instead of leaving the wide pipe glued onto an + adjacent token. The tokenizer in + `alfred/domain/release/parser/pipeline.py` already iterates the + separator list as plain strings (no regex), so a multi-byte UTF-8 + separator works without any code change. + - **`InspectedResult.recommended_action` property** — derived hint that collapses the orchestrator's go / wait / skip decision into a single value (``"process"`` / ``"ask_user"`` / ``"skip"``). Centralizes the diff --git a/alfred/knowledge/release/separators.yaml b/alfred/knowledge/release/separators.yaml index 19ae243..dd117a2 100644 --- a/alfred/knowledge/release/separators.yaml +++ b/alfred/knowledge/release/separators.yaml @@ -21,3 +21,4 @@ separators: - "(" # parenthesis-embedded (year, edition): (2020) (Director's Cut) - ")" - "_" # underscore-as-space (old usenet, some Asian releases) + - "|" # fullwidth vertical bar U+FF5C (CJK release names, occasional decorative use)