chore: sprint cleanup — language unification, parser unification, fossils removal

Several weeks of work accumulated without being committed. Grouped here for
clarity; see CHANGELOG.md [Unreleased] for the user-facing summary.

Highlights
----------

P1 #2 — ISO 639-2/B canonical migration
- New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/).
- iso_languages.yaml as single source of truth for language codes.
- SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml
  only declares subtitle-specific tokens (vostfr, vf, vff, …).
- SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as
  {iso639_2b}.srt (legacy fr.srt still read via alias).
- Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS /
  SUBTITLE_EXTENSIONS hardcoded dicts.
- Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias).
- Added settings.min_movie_size_bytes (was a module constant).

P1 #3 — Release parser unification + data-driven tokenizer
- parse_release() is now the single source of truth for release-name parsing.
- alfred/knowledge/release/separators.yaml declares the token separators used
  by the tokenizer (., space, [, ], (, ), _). New conventions can be added
  without code changes.
- Tokenizer now splits on any configured separator instead of name.split('.').
  Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via
  the direct path without sanitization fallback.
- Site-tag extraction always runs first; well-formedness only rejects truly
  forbidden chars.
- _parse_season_episode() extended with NxNN / NxNNxNN alt forms.
- Removed dead helpers: _sanitize, _normalize.

Domain cleanup
- Deleted fossil services with zero production callers:
    alfred/domain/movies/services.py
    alfred/domain/tv_shows/services.py
    alfred/domain/subtitles/services.py (replaced by subtitles/services/ package)
    alfred/domain/subtitles/repositories.py
- Split monolithic subtitle services into a package (identifier, matcher,
  placer, pattern_detector, utils) + dedicated knowledge/ package.
- MediaInfo split into dedicated package (alfred/domain/shared/media/:
  audio, video, subtitle, info, matching).

Persistence cleanup
- Removed dead JSON repositories (movie/subtitle/tvshow_repository.py).

Tests
- Major expansion of the test suite organized to mirror the source tree.
- Removed obsolete *_edge_cases test files superseded by structured tests.
- Suite: 990 passed, 8 skipped.

Misc
- .gitignore: exclude env_backup/ and *.bak.
- Adjustments across agent/llm, app.py, application/filesystem, and
  infrastructure/filesystem to align with the new domain layout.
This commit is contained in:
2026-05-17 23:38:00 +02:00
parent ba6f016d49
commit e07c9ec77b
99 changed files with 8833 additions and 6533 deletions
+82 -16
View File
@@ -1,5 +1,20 @@
"""
Tests for alfred.agent.registry — tool registration and JSON schema generation.
"""Tests for ``alfred.agent.registry`` — tool registration and JSON schema gen.
Two suites:
1. **TestCreateToolFromFunction** — Unit-tests the schema extraction from a
bare Python function: name resolution, docstring → description, required
versus optional parameters, ``Optional[X]`` / ``X | None`` stripping, and
the Python-to-JSON-Schema type mapping (``str/int/float/bool/list/dict``
→ ``string/integer/number/boolean/array/object``).
2. **TestMakeTools** — Integration check on the live registry: every tool
declared in ``make_tools(settings)`` is a real ``Tool`` instance with a
callable ``func`` and a name matching its dict key, and a known core set
of tools is always present. Resolver tests target the four media-typed
resolvers (``resolve_movie_destination``, ``_season_``, ``_episode_``,
``_series_``), not the legacy unified ``resolve_destination`` which no
longer exists.
"""
from alfred.agent.registry import Tool, _create_tool_from_function, make_tools
@@ -95,12 +110,43 @@ class TestCreateToolFromFunction:
t = _create_tool_from_function(tool)
assert t.parameters["properties"]["x"]["type"] == "boolean"
def test_unknown_type_defaults_to_string(self):
def test_type_mapping_list(self):
def tool(x: list) -> dict:
"""T."""
return {}
t = _create_tool_from_function(tool)
assert t.parameters["properties"]["x"]["type"] == "array"
def test_type_mapping_dict(self):
def tool(x: dict) -> dict:
"""T."""
return {}
t = _create_tool_from_function(tool)
assert t.parameters["properties"]["x"]["type"] == "object"
def test_unknown_type_defaults_to_string(self):
"""Custom classes without a JSON-Schema mapping fall back to ``string``."""
class CustomType:
pass
def tool(x: CustomType) -> dict:
"""T."""
return {}
t = _create_tool_from_function(tool)
assert t.parameters["properties"]["x"]["type"] == "string"
def test_optional_annotation_unwrapped(self):
def tool(x: str | None = None) -> dict:
"""T."""
return {}
t = _create_tool_from_function(tool)
# ``str | None`` should unwrap to ``str``, not fall back to "string"
# by accident — the mapping is intentional.
assert t.parameters["properties"]["x"]["type"] == "string"
def test_no_annotation_defaults_to_string(self):
@@ -150,23 +196,39 @@ class TestMakeTools:
assert isinstance(tools, dict)
def test_all_expected_tools_present(self):
"""Core tool set that the agent needs to perform the end-to-end flow."""
tools = make_tools(settings)
expected = {
# Folder & filesystem
"set_path_for_folder",
"list_folder",
"resolve_destination",
"move_media",
"move_to_destination",
# Resolvers (one per media type — no unified resolve_destination)
"resolve_season_destination",
"resolve_episode_destination",
"resolve_movie_destination",
"resolve_series_destination",
# Subtitles & seeding
"manage_subtitles",
"create_seed_links",
"learn",
# API
"find_media_imdb_id",
"find_torrent",
"add_torrent_by_index",
"add_torrent_to_qbittorrent",
"get_torrent_by_index",
# Conversation
"set_language",
}
assert expected.issubset(tools.keys())
missing = expected - tools.keys()
assert not missing, f"missing tools: {sorted(missing)}"
def test_no_legacy_unified_resolver(self):
"""The single ``resolve_destination`` tool was replaced by four typed resolvers."""
tools = make_tools(settings)
assert "resolve_destination" not in tools
def test_each_tool_is_tool_instance(self):
tools = make_tools(settings)
@@ -183,21 +245,25 @@ class TestMakeTools:
for key, tool in tools.items():
assert tool.name == key
def test_resolve_destination_schema(self):
def test_resolve_movie_destination_schema(self):
tools = make_tools(settings)
t = tools["resolve_destination"]
props = t.parameters["properties"]
t = tools["resolve_movie_destination"]
# Required args common to all movie resolutions.
for required_arg in ("source_file", "tmdb_title", "tmdb_year"):
assert required_arg in t.parameters["required"], (
f"resolve_movie_destination should require {required_arg}"
)
# tmdb_year is typed as int.
assert t.parameters["properties"]["tmdb_year"]["type"] == "integer"
def test_resolve_episode_destination_schema(self):
tools = make_tools(settings)
t = tools["resolve_episode_destination"]
required = t.parameters["required"]
# Required args
assert "release_name" in required
# An episode resolution needs at least the source file and the show
# identification (title/year). Season/episode numbers also required.
assert "source_file" in required
assert "tmdb_title" in required
assert "tmdb_year" in required
# Optional args not required
assert "tmdb_episode_title" not in required
assert "confirmed_folder" not in required
# tmdb_year is int
assert props["tmdb_year"]["type"] == "integer"
def test_move_media_schema(self):
tools = make_tools(settings)