e07c9ec77b
Several weeks of work accumulated without being committed. Grouped here for clarity; see CHANGELOG.md [Unreleased] for the user-facing summary. Highlights ---------- P1 #2 — ISO 639-2/B canonical migration - New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/). - iso_languages.yaml as single source of truth for language codes. - SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml only declares subtitle-specific tokens (vostfr, vf, vff, …). - SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as {iso639_2b}.srt (legacy fr.srt still read via alias). - Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS / SUBTITLE_EXTENSIONS hardcoded dicts. - Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias). - Added settings.min_movie_size_bytes (was a module constant). P1 #3 — Release parser unification + data-driven tokenizer - parse_release() is now the single source of truth for release-name parsing. - alfred/knowledge/release/separators.yaml declares the token separators used by the tokenizer (., space, [, ], (, ), _). New conventions can be added without code changes. - Tokenizer now splits on any configured separator instead of name.split('.'). Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via the direct path without sanitization fallback. - Site-tag extraction always runs first; well-formedness only rejects truly forbidden chars. - _parse_season_episode() extended with NxNN / NxNNxNN alt forms. - Removed dead helpers: _sanitize, _normalize. Domain cleanup - Deleted fossil services with zero production callers: alfred/domain/movies/services.py alfred/domain/tv_shows/services.py alfred/domain/subtitles/services.py (replaced by subtitles/services/ package) alfred/domain/subtitles/repositories.py - Split monolithic subtitle services into a package (identifier, matcher, placer, pattern_detector, utils) + dedicated knowledge/ package. - MediaInfo split into dedicated package (alfred/domain/shared/media/: audio, video, subtitle, info, matching). Persistence cleanup - Removed dead JSON repositories (movie/subtitle/tvshow_repository.py). Tests - Major expansion of the test suite organized to mirror the source tree. - Removed obsolete *_edge_cases test files superseded by structured tests. - Suite: 990 passed, 8 skipped. Misc - .gitignore: exclude env_backup/ and *.bak. - Adjustments across agent/llm, app.py, application/filesystem, and infrastructure/filesystem to align with the new domain layout.
221 lines
4.9 KiB
YAML
221 lines
4.9 KiB
YAML
name: iso_languages
|
|
version: "1.0"
|
|
description: >
|
|
Canonical language table. The primary key is the ISO 639-2/B code (3 letters,
|
|
bibliographic form), which is what ffprobe emits and is the project-wide
|
|
canonical form. Aliases include the ISO 639-1 code, the ISO 639-2/T
|
|
(terminologic) variant when it differs, english/native names, and any common
|
|
spelling encountered in release names or filesystems.
|
|
Lookups are case-insensitive and operate on the union of {iso, aliases}.
|
|
|
|
languages:
|
|
fre:
|
|
english_name: French
|
|
native_name: Français
|
|
aliases: [fr, fra, french, francais]
|
|
|
|
eng:
|
|
english_name: English
|
|
native_name: English
|
|
aliases: [en, english]
|
|
|
|
spa:
|
|
english_name: Spanish
|
|
native_name: Español
|
|
aliases: [es, spanish, espanol, español, castellano]
|
|
|
|
ger:
|
|
english_name: German
|
|
native_name: Deutsch
|
|
aliases: [de, deu, german, deutsch]
|
|
|
|
ita:
|
|
english_name: Italian
|
|
native_name: Italiano
|
|
aliases: [it, italian, italiano]
|
|
|
|
por:
|
|
english_name: Portuguese
|
|
native_name: Português
|
|
aliases: [pt, portuguese, portugues, português, brazilian, brasileiro]
|
|
|
|
dut:
|
|
english_name: Dutch
|
|
native_name: Nederlands
|
|
aliases: [nl, nld, dutch, nederlands]
|
|
|
|
nor:
|
|
english_name: Norwegian
|
|
native_name: Norsk
|
|
aliases: [no, norwegian, norsk]
|
|
|
|
swe:
|
|
english_name: Swedish
|
|
native_name: Svenska
|
|
aliases: [sv, swedish, svenska]
|
|
|
|
dan:
|
|
english_name: Danish
|
|
native_name: Dansk
|
|
aliases: [da, danish, dansk]
|
|
|
|
fin:
|
|
english_name: Finnish
|
|
native_name: Suomi
|
|
aliases: [fi, finnish, suomi]
|
|
|
|
pol:
|
|
english_name: Polish
|
|
native_name: Polski
|
|
aliases: [pl, polish, polski]
|
|
|
|
cze:
|
|
english_name: Czech
|
|
native_name: Čeština
|
|
aliases: [cs, ces, czech, cestina, čeština]
|
|
|
|
slo:
|
|
english_name: Slovak
|
|
native_name: Slovenčina
|
|
aliases: [sk, slk, slovak, slovencina, slovenčina]
|
|
|
|
hun:
|
|
english_name: Hungarian
|
|
native_name: Magyar
|
|
aliases: [hu, hungarian, magyar]
|
|
|
|
rum:
|
|
english_name: Romanian
|
|
native_name: Română
|
|
aliases: [ro, ron, romanian, romana, română]
|
|
|
|
bul:
|
|
english_name: Bulgarian
|
|
native_name: Български
|
|
aliases: [bg, bulgarian, български]
|
|
|
|
hrv:
|
|
english_name: Croatian
|
|
native_name: Hrvatski
|
|
aliases: [hr, croatian, hrvatski]
|
|
|
|
srp:
|
|
english_name: Serbian
|
|
native_name: Srpski
|
|
aliases: [sr, serbian, srpski, српски]
|
|
|
|
slv:
|
|
english_name: Slovenian
|
|
native_name: Slovenščina
|
|
aliases: [sl, slovenian, slovensko, slovenščina]
|
|
|
|
est:
|
|
english_name: Estonian
|
|
native_name: Eesti
|
|
aliases: [et, estonian, eesti]
|
|
|
|
lav:
|
|
english_name: Latvian
|
|
native_name: Latviešu
|
|
aliases: [lv, latvian, latviesu, latviešu]
|
|
|
|
lit:
|
|
english_name: Lithuanian
|
|
native_name: Lietuvių
|
|
aliases: [lt, lithuanian, lietuviu, lietuvių]
|
|
|
|
mac:
|
|
english_name: Macedonian
|
|
native_name: Македонски
|
|
aliases: [mk, mkd, macedonian, македонски]
|
|
|
|
jpn:
|
|
english_name: Japanese
|
|
native_name: 日本語
|
|
aliases: [ja, japanese, 日本語]
|
|
|
|
chi:
|
|
english_name: Chinese
|
|
native_name: 中文
|
|
aliases: [zh, zho, chinese, simplified, traditional, mandarin, 中文]
|
|
|
|
yue:
|
|
english_name: Cantonese
|
|
native_name: 粵語
|
|
aliases: [cantonese, 粵語, 粤语]
|
|
|
|
kor:
|
|
english_name: Korean
|
|
native_name: 한국어
|
|
aliases: [ko, korean, 한국어]
|
|
|
|
ara:
|
|
english_name: Arabic
|
|
native_name: العربية
|
|
aliases: [ar, arabic, العربية]
|
|
|
|
tur:
|
|
english_name: Turkish
|
|
native_name: Türkçe
|
|
aliases: [tr, turkish, turkce, türkçe]
|
|
|
|
gre:
|
|
english_name: Greek
|
|
native_name: Ελληνικά
|
|
aliases: [el, ell, greek, ελληνικά]
|
|
|
|
ind:
|
|
english_name: Indonesian
|
|
native_name: Bahasa Indonesia
|
|
aliases: [id, indonesian, bahasa]
|
|
|
|
may:
|
|
english_name: Malay
|
|
native_name: Bahasa Melayu
|
|
aliases: [ms, msa, malay, melayu]
|
|
|
|
rus:
|
|
english_name: Russian
|
|
native_name: Русский
|
|
aliases: [ru, russian, русский]
|
|
|
|
vie:
|
|
english_name: Vietnamese
|
|
native_name: Tiếng Việt
|
|
aliases: [vi, vietnamese, tiếng việt]
|
|
|
|
heb:
|
|
english_name: Hebrew
|
|
native_name: עברית
|
|
aliases: [he, hebrew, עברית]
|
|
|
|
tam:
|
|
english_name: Tamil
|
|
native_name: தமிழ்
|
|
aliases: [ta, tamil, தமிழ்]
|
|
|
|
tel:
|
|
english_name: Telugu
|
|
native_name: తెలుగు
|
|
aliases: [te, telugu, తెలుగు]
|
|
|
|
tha:
|
|
english_name: Thai
|
|
native_name: ไทย
|
|
aliases: [th, thai, ไทย]
|
|
|
|
hin:
|
|
english_name: Hindi
|
|
native_name: हिन्दी
|
|
aliases: [hi, hindi, हिन्दी]
|
|
|
|
ukr:
|
|
english_name: Ukrainian
|
|
native_name: Українська
|
|
aliases: [uk, ukrainian, українська]
|
|
|
|
und:
|
|
english_name: Undetermined
|
|
native_name: Undetermined
|
|
aliases: [unknown, unk]
|