Files
alfred/alfred/agent/llm/deepseek.py
T
francwa e07c9ec77b chore: sprint cleanup — language unification, parser unification, fossils removal
Several weeks of work accumulated without being committed. Grouped here for
clarity; see CHANGELOG.md [Unreleased] for the user-facing summary.

Highlights
----------

P1 #2 — ISO 639-2/B canonical migration
- New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/).
- iso_languages.yaml as single source of truth for language codes.
- SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml
  only declares subtitle-specific tokens (vostfr, vf, vff, …).
- SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as
  {iso639_2b}.srt (legacy fr.srt still read via alias).
- Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS /
  SUBTITLE_EXTENSIONS hardcoded dicts.
- Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias).
- Added settings.min_movie_size_bytes (was a module constant).

P1 #3 — Release parser unification + data-driven tokenizer
- parse_release() is now the single source of truth for release-name parsing.
- alfred/knowledge/release/separators.yaml declares the token separators used
  by the tokenizer (., space, [, ], (, ), _). New conventions can be added
  without code changes.
- Tokenizer now splits on any configured separator instead of name.split('.').
  Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via
  the direct path without sanitization fallback.
- Site-tag extraction always runs first; well-formedness only rejects truly
  forbidden chars.
- _parse_season_episode() extended with NxNN / NxNNxNN alt forms.
- Removed dead helpers: _sanitize, _normalize.

Domain cleanup
- Deleted fossil services with zero production callers:
    alfred/domain/movies/services.py
    alfred/domain/tv_shows/services.py
    alfred/domain/subtitles/services.py (replaced by subtitles/services/ package)
    alfred/domain/subtitles/repositories.py
- Split monolithic subtitle services into a package (identifier, matcher,
  placer, pattern_detector, utils) + dedicated knowledge/ package.
- MediaInfo split into dedicated package (alfred/domain/shared/media/:
  audio, video, subtitle, info, matching).

Persistence cleanup
- Removed dead JSON repositories (movie/subtitle/tvshow_repository.py).

Tests
- Major expansion of the test suite organized to mirror the source tree.
- Removed obsolete *_edge_cases test files superseded by structured tests.
- Suite: 990 passed, 8 skipped.

Misc
- .gitignore: exclude env_backup/ and *.bak.
- Adjustments across agent/llm, app.py, application/filesystem, and
  infrastructure/filesystem to align with the new domain layout.
2026-05-17 23:38:00 +02:00

153 lines
5.6 KiB
Python

"""DeepSeek LLM client with robust error handling."""
import logging
from typing import Any
import requests
from requests.exceptions import HTTPError, RequestException, Timeout
from alfred.settings import Settings
from alfred.settings import settings as default_settings
from .exceptions import LLMAPIError, LLMConfigurationError
logger = logging.getLogger(__name__)
class DeepSeekClient:
"""Client for interacting with DeepSeek API."""
def __init__(
self,
api_key: str | None = None,
base_url: str | None = None,
model: str | None = None,
timeout: int | None = None,
settings: Settings | None = None,
):
"""
Initialize DeepSeek client.
Args:
api_key: API key for authentication (defaults to settings)
base_url: Base URL for API (defaults to settings)
model: Model name to use (defaults to settings)
timeout: Request timeout in seconds (defaults to settings)
Raises:
LLMConfigurationError: If API key is missing
"""
self.settings = settings or default_settings
self.api_key = api_key or self.settings.deepseek_api_key
self.base_url = base_url or self.settings.deepseek_base_url
self.model = model or self.settings.deepseek_model
self.timeout = timeout or self.settings.request_timeout
if not self.api_key:
raise LLMConfigurationError(
"DeepSeek API key is required. Set DEEPSEEK_API_KEY environment variable."
)
if not self.base_url:
raise LLMConfigurationError(
"DeepSeek base URL is required. Set DEEPSEEK_BASE_URL environment variable."
)
logger.info(f"DeepSeek client initialized with model: {self.model}")
def complete( # noqa: PLR0912
self, messages: list[dict[str, Any]], tools: list[dict[str, Any]] | None = None
) -> dict[str, Any]:
"""
Generate a completion from the LLM.
Args:
messages: List of message dicts with 'role' and 'content' keys
tools: Optional list of tool specifications (OpenAI format)
Returns:
OpenAI-compatible message dict with 'role', 'content', and optionally 'tool_calls'
Raises:
LLMAPIError: If API request fails
ValueError: If messages format is invalid
"""
# Validate messages format
if not messages:
raise ValueError("Messages list cannot be empty")
for msg in messages:
if not isinstance(msg, dict):
raise ValueError(f"Each message must be a dict, got {type(msg)}")
if "role" not in msg:
raise ValueError(f"Message must have 'role' key, got {msg.keys()}")
# Allow system, user, assistant, and tool roles
if msg["role"] not in ("system", "user", "assistant", "tool"):
raise ValueError(f"Invalid role: {msg['role']}")
# Content is optional for tool messages (they may have tool_call_id instead)
if msg["role"] != "tool" and "content" not in msg:
raise ValueError(
f"Non-tool message must have 'content' key, got {msg.keys()}"
)
url = f"{self.base_url}/v1/chat/completions"
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}
payload = {
"model": self.model,
"messages": messages,
"temperature": self.settings.llm_temperature,
}
# Add tools if provided
if tools:
payload["tools"] = tools
try:
logger.debug(
f"Sending request to {url} with {len(messages)} messages and {len(tools) if tools else 0} tools"
)
response = requests.post(
url, headers=headers, json=payload, timeout=self.timeout
)
response.raise_for_status()
data = response.json()
# Validate response structure
if "choices" not in data or not data["choices"]:
raise LLMAPIError("Invalid API response: missing 'choices'")
if "message" not in data["choices"][0]:
raise LLMAPIError("Invalid API response: missing 'message' in choice")
# Return the full message dict (OpenAI format)
message = data["choices"][0]["message"]
logger.debug(f"Received response: {message.get('content', '')[:100]}...")
return message
except Timeout as e:
logger.error(f"Request timeout after {self.timeout}s: {e}")
raise LLMAPIError(f"Request timeout after {self.timeout} seconds") from e
except HTTPError as e:
logger.error(f"HTTP error from DeepSeek API: {e}")
if e.response is not None:
try:
error_data = e.response.json()
error_msg = error_data.get("error", {}).get("message", str(e))
except Exception:
error_msg = str(e)
raise LLMAPIError(f"DeepSeek API error: {error_msg}") from e
raise LLMAPIError(f"HTTP error: {e}") from e
except RequestException as e:
logger.error(f"Request failed: {e}")
raise LLMAPIError(f"Failed to connect to DeepSeek API: {e}") from e
except (KeyError, IndexError, TypeError) as e:
logger.error(f"Failed to parse API response: {e}")
raise LLMAPIError(f"Invalid API response format: {e}") from e