chore: sprint cleanup — language unification, parser unification, fossils removal

Several weeks of work accumulated without being committed. Grouped here for
clarity; see CHANGELOG.md [Unreleased] for the user-facing summary.

Highlights
----------

P1 #2 — ISO 639-2/B canonical migration
- New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/).
- iso_languages.yaml as single source of truth for language codes.
- SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml
  only declares subtitle-specific tokens (vostfr, vf, vff, …).
- SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as
  {iso639_2b}.srt (legacy fr.srt still read via alias).
- Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS /
  SUBTITLE_EXTENSIONS hardcoded dicts.
- Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias).
- Added settings.min_movie_size_bytes (was a module constant).

P1 #3 — Release parser unification + data-driven tokenizer
- parse_release() is now the single source of truth for release-name parsing.
- alfred/knowledge/release/separators.yaml declares the token separators used
  by the tokenizer (., space, [, ], (, ), _). New conventions can be added
  without code changes.
- Tokenizer now splits on any configured separator instead of name.split('.').
  Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via
  the direct path without sanitization fallback.
- Site-tag extraction always runs first; well-formedness only rejects truly
  forbidden chars.
- _parse_season_episode() extended with NxNN / NxNNxNN alt forms.
- Removed dead helpers: _sanitize, _normalize.

Domain cleanup
- Deleted fossil services with zero production callers:
    alfred/domain/movies/services.py
    alfred/domain/tv_shows/services.py
    alfred/domain/subtitles/services.py (replaced by subtitles/services/ package)
    alfred/domain/subtitles/repositories.py
- Split monolithic subtitle services into a package (identifier, matcher,
  placer, pattern_detector, utils) + dedicated knowledge/ package.
- MediaInfo split into dedicated package (alfred/domain/shared/media/:
  audio, video, subtitle, info, matching).

Persistence cleanup
- Removed dead JSON repositories (movie/subtitle/tvshow_repository.py).

Tests
- Major expansion of the test suite organized to mirror the source tree.
- Removed obsolete *_edge_cases test files superseded by structured tests.
- Suite: 990 passed, 8 skipped.

Misc
- .gitignore: exclude env_backup/ and *.bak.
- Adjustments across agent/llm, app.py, application/filesystem, and
  infrastructure/filesystem to align with the new domain layout.
This commit is contained in:
2026-05-17 23:38:00 +02:00
parent ba6f016d49
commit e07c9ec77b
99 changed files with 8833 additions and 6533 deletions
+23 -10
View File
@@ -1,4 +1,17 @@
"""Edge case tests for tools."""
"""Edge-case tests for the agent tools.
Exercises pathological and adversarial inputs for the public tool surface:
- **TestFindTorrentEdgeCases** — wraps ``find_torrent`` (mocking the use
case) to assert behavior on absent results, malformed responses, and
unexpected exceptions.
- **TestFilesystemEdgeCases** — pushes ``set_path_for_folder`` /
``list_folder`` through traversal attempts, null bytes, hidden files,
broken/escaping symlinks, unicode, deep paths, and oversize inputs.
Uses the current LTM API (``memory.ltm.workspace.download``); the legacy
flat attribute ``download_folder`` no longer exists.
"""
from unittest.mock import Mock, patch
@@ -271,7 +284,7 @@ class TestFilesystemEdgeCases:
"""Should list hidden files."""
hidden_file = real_folder["downloads"] / ".hidden"
hidden_file.touch()
memory.ltm.download_folder = str(real_folder["downloads"])
memory.ltm.workspace.download = str(real_folder["downloads"])
result = fs_tools.list_folder("download")
@@ -285,7 +298,7 @@ class TestFilesystemEdgeCases:
except OSError:
pytest.skip("Cannot create symlinks")
memory.ltm.download_folder = str(real_folder["downloads"])
memory.ltm.workspace.download = str(real_folder["downloads"])
result = fs_tools.list_folder("download")
@@ -301,7 +314,7 @@ class TestFilesystemEdgeCases:
try:
os.chmod(no_read, 0o000)
memory.ltm.download_folder = str(real_folder["downloads"])
memory.ltm.workspace.download = str(real_folder["downloads"])
result = fs_tools.list_folder("download")
@@ -312,7 +325,7 @@ class TestFilesystemEdgeCases:
def test_list_folder_case_sensitivity(self, memory, real_folder):
"""Should handle case sensitivity correctly."""
memory.ltm.download_folder = str(real_folder["downloads"])
memory.ltm.workspace.download = str(real_folder["downloads"])
# Try with different cases
result_lower = fs_tools.list_folder("download")
@@ -324,7 +337,7 @@ class TestFilesystemEdgeCases:
"""Should handle spaces in path."""
space_dir = real_folder["downloads"] / "folder with spaces"
space_dir.mkdir()
memory.ltm.download_folder = str(real_folder["downloads"])
memory.ltm.workspace.download = str(real_folder["downloads"])
result = fs_tools.list_folder("download", "folder with spaces")
@@ -332,7 +345,7 @@ class TestFilesystemEdgeCases:
def test_path_traversal_with_encoded_chars(self, memory, real_folder):
"""Should block URL-encoded traversal attempts."""
memory.ltm.download_folder = str(real_folder["downloads"])
memory.ltm.workspace.download = str(real_folder["downloads"])
# Various encoding attempts
attempts = [
@@ -352,7 +365,7 @@ class TestFilesystemEdgeCases:
def test_path_with_null_byte(self, memory, real_folder):
"""Should block null byte injection."""
memory.ltm.download_folder = str(real_folder["downloads"])
memory.ltm.workspace.download = str(real_folder["downloads"])
result = fs_tools.list_folder("download", "file\x00.txt")
@@ -366,7 +379,7 @@ class TestFilesystemEdgeCases:
deep_path = deep_path / f"level{i}"
deep_path.mkdir(parents=True)
memory.ltm.download_folder = str(real_folder["downloads"])
memory.ltm.workspace.download = str(real_folder["downloads"])
# Navigate to deep path
relative_path = "/".join([f"level{i}" for i in range(20)])
@@ -380,7 +393,7 @@ class TestFilesystemEdgeCases:
for i in range(1000):
(real_folder["downloads"] / f"file_{i:04d}.txt").touch()
memory.ltm.download_folder = str(real_folder["downloads"])
memory.ltm.workspace.download = str(real_folder["downloads"])
result = fs_tools.list_folder("download")