Files
alfred/alfred/knowledge/iso_languages.yaml
T
francwa e07c9ec77b chore: sprint cleanup — language unification, parser unification, fossils removal
Several weeks of work accumulated without being committed. Grouped here for
clarity; see CHANGELOG.md [Unreleased] for the user-facing summary.

Highlights
----------

P1 #2 — ISO 639-2/B canonical migration
- New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/).
- iso_languages.yaml as single source of truth for language codes.
- SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml
  only declares subtitle-specific tokens (vostfr, vf, vff, …).
- SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as
  {iso639_2b}.srt (legacy fr.srt still read via alias).
- Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS /
  SUBTITLE_EXTENSIONS hardcoded dicts.
- Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias).
- Added settings.min_movie_size_bytes (was a module constant).

P1 #3 — Release parser unification + data-driven tokenizer
- parse_release() is now the single source of truth for release-name parsing.
- alfred/knowledge/release/separators.yaml declares the token separators used
  by the tokenizer (., space, [, ], (, ), _). New conventions can be added
  without code changes.
- Tokenizer now splits on any configured separator instead of name.split('.').
  Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via
  the direct path without sanitization fallback.
- Site-tag extraction always runs first; well-formedness only rejects truly
  forbidden chars.
- _parse_season_episode() extended with NxNN / NxNNxNN alt forms.
- Removed dead helpers: _sanitize, _normalize.

Domain cleanup
- Deleted fossil services with zero production callers:
    alfred/domain/movies/services.py
    alfred/domain/tv_shows/services.py
    alfred/domain/subtitles/services.py (replaced by subtitles/services/ package)
    alfred/domain/subtitles/repositories.py
- Split monolithic subtitle services into a package (identifier, matcher,
  placer, pattern_detector, utils) + dedicated knowledge/ package.
- MediaInfo split into dedicated package (alfred/domain/shared/media/:
  audio, video, subtitle, info, matching).

Persistence cleanup
- Removed dead JSON repositories (movie/subtitle/tvshow_repository.py).

Tests
- Major expansion of the test suite organized to mirror the source tree.
- Removed obsolete *_edge_cases test files superseded by structured tests.
- Suite: 990 passed, 8 skipped.

Misc
- .gitignore: exclude env_backup/ and *.bak.
- Adjustments across agent/llm, app.py, application/filesystem, and
  infrastructure/filesystem to align with the new domain layout.
2026-05-17 23:38:00 +02:00

221 lines
4.9 KiB
YAML

name: iso_languages
version: "1.0"
description: >
Canonical language table. The primary key is the ISO 639-2/B code (3 letters,
bibliographic form), which is what ffprobe emits and is the project-wide
canonical form. Aliases include the ISO 639-1 code, the ISO 639-2/T
(terminologic) variant when it differs, english/native names, and any common
spelling encountered in release names or filesystems.
Lookups are case-insensitive and operate on the union of {iso, aliases}.
languages:
fre:
english_name: French
native_name: Français
aliases: [fr, fra, french, francais]
eng:
english_name: English
native_name: English
aliases: [en, english]
spa:
english_name: Spanish
native_name: Español
aliases: [es, spanish, espanol, español, castellano]
ger:
english_name: German
native_name: Deutsch
aliases: [de, deu, german, deutsch]
ita:
english_name: Italian
native_name: Italiano
aliases: [it, italian, italiano]
por:
english_name: Portuguese
native_name: Português
aliases: [pt, portuguese, portugues, português, brazilian, brasileiro]
dut:
english_name: Dutch
native_name: Nederlands
aliases: [nl, nld, dutch, nederlands]
nor:
english_name: Norwegian
native_name: Norsk
aliases: [no, norwegian, norsk]
swe:
english_name: Swedish
native_name: Svenska
aliases: [sv, swedish, svenska]
dan:
english_name: Danish
native_name: Dansk
aliases: [da, danish, dansk]
fin:
english_name: Finnish
native_name: Suomi
aliases: [fi, finnish, suomi]
pol:
english_name: Polish
native_name: Polski
aliases: [pl, polish, polski]
cze:
english_name: Czech
native_name: Čeština
aliases: [cs, ces, czech, cestina, čeština]
slo:
english_name: Slovak
native_name: Slovenčina
aliases: [sk, slk, slovak, slovencina, slovenčina]
hun:
english_name: Hungarian
native_name: Magyar
aliases: [hu, hungarian, magyar]
rum:
english_name: Romanian
native_name: Română
aliases: [ro, ron, romanian, romana, română]
bul:
english_name: Bulgarian
native_name: Български
aliases: [bg, bulgarian, български]
hrv:
english_name: Croatian
native_name: Hrvatski
aliases: [hr, croatian, hrvatski]
srp:
english_name: Serbian
native_name: Srpski
aliases: [sr, serbian, srpski, српски]
slv:
english_name: Slovenian
native_name: Slovenščina
aliases: [sl, slovenian, slovensko, slovenščina]
est:
english_name: Estonian
native_name: Eesti
aliases: [et, estonian, eesti]
lav:
english_name: Latvian
native_name: Latviešu
aliases: [lv, latvian, latviesu, latviešu]
lit:
english_name: Lithuanian
native_name: Lietuvių
aliases: [lt, lithuanian, lietuviu, lietuvių]
mac:
english_name: Macedonian
native_name: Македонски
aliases: [mk, mkd, macedonian, македонски]
jpn:
english_name: Japanese
native_name: 日本語
aliases: [ja, japanese, 日本語]
chi:
english_name: Chinese
native_name: 中文
aliases: [zh, zho, chinese, simplified, traditional, mandarin, 中文]
yue:
english_name: Cantonese
native_name: 粵語
aliases: [cantonese, 粵語, 粤语]
kor:
english_name: Korean
native_name: 한국어
aliases: [ko, korean, 한국어]
ara:
english_name: Arabic
native_name: العربية
aliases: [ar, arabic, العربية]
tur:
english_name: Turkish
native_name: Türkçe
aliases: [tr, turkish, turkce, türkçe]
gre:
english_name: Greek
native_name: Ελληνικά
aliases: [el, ell, greek, ελληνικά]
ind:
english_name: Indonesian
native_name: Bahasa Indonesia
aliases: [id, indonesian, bahasa]
may:
english_name: Malay
native_name: Bahasa Melayu
aliases: [ms, msa, malay, melayu]
rus:
english_name: Russian
native_name: Русский
aliases: [ru, russian, русский]
vie:
english_name: Vietnamese
native_name: Tiếng Việt
aliases: [vi, vietnamese, tiếng việt]
heb:
english_name: Hebrew
native_name: עברית
aliases: [he, hebrew, עברית]
tam:
english_name: Tamil
native_name: தமிழ்
aliases: [ta, tamil, தமிழ்]
tel:
english_name: Telugu
native_name: తెలుగు
aliases: [te, telugu, తెలుగు]
tha:
english_name: Thai
native_name: ไทย
aliases: [th, thai, ไทย]
hin:
english_name: Hindi
native_name: हिन्दी
aliases: [hi, hindi, हिन्दी]
ukr:
english_name: Ukrainian
native_name: Українська
aliases: [uk, ukrainian, українська]
und:
english_name: Undetermined
native_name: Undetermined
aliases: [unknown, unk]