Commit Graph

17 Commits

Author SHA1 Message Date
francwa c22b2b78eb refactor(domain): Phase 3 — TVShow/Movie aggregates become TMDB-only
Filesystem-side concerns (file paths, tracks, quality, mode, added_at)
move to the releases/ domain added in Phase 1; the TMDB aggregates now
carry only identity + TMDB catalog facts.

Domain entities:
- TVShow: tmdb_id: TmdbId required (primary key), imdb_id: ImdbId | None
  optional, status: str = "unknown" added.
- Season: episode_count: int = 0 added (TMDB-cached); audio_tracks,
  subtitle_tracks, mode property removed.
- Episode: slimmed to identity + title. file_path/file_size/tracks
  removed. No longer inherits MediaWithTracks.
- Movie: tmdb_id required, imdb_id optional. file_path/file_size/quality/
  added_at/audio_tracks/subtitle_tracks removed. get_filename() now
  returns "Title.Year" — quality moves to MovieRelease.

Builders:
- TVShowBuilder requires tmdb_id: TmdbId; imdb_id/status optional.
- SeasonBuilder.set_episode_count(int) replaces set_audio_tracks /
  set_subtitle_tracks.

No-coercion contract: TVShow(tmdb_id=1396) raises — callers pass
TmdbId(1396). No ergonomic shim per the no-shims rule.

Cascade fixes:
- MediaOrganizer test fixtures updated to new Movie/TVShow shapes.
- Movie.get_filename() re-added (without Quality) so MediaOrganizer
  keeps working until Phase 4 rewires it through MovieRelease.

Quarantined (deleted in Phase 4 alongside v1 dot_alfred):
- tests/application/library/test_rescan.py — module-level skip.
- tests/infrastructure/persistence/dot_alfred/test_repository.py —
  module-level skip.
- tests/infrastructure/persistence/dot_alfred/test_serializer.py —
  module-level skip.

Suite: 1216 passed, 11 skipped (8 pre-existing + 3 Phase 3
quarantines), 4 xfailed. CHANGELOG updated under [Unreleased].
2026-05-25 19:54:35 +02:00
francwa 2f160644da feat(dot_alfred/v2): bump SCHEMA_VERSION to 2 — added_at on MovieRelease
Phase 3 prep: Movie aggregate is about to become TMDB-only (no
filesystem fields). added_at is a release-time observation, not a
TMDB-aggregate concern, so it moves to MovieRelease +
MovieReleaseSidecar.

- Add added_at: datetime (required) to MovieRelease with a
  type-check in __post_init__.
- Add added_at: datetime (required) to MovieReleaseSidecar.
- Bump SCHEMA_VERSION 1 → 2 with a version-history note.
- Bridge round-trips added_at via Pydantic mode="json" (datetime
  → ISO 8601 string).
- Tests: update MovieRelease fixtures, add a validator test, add
  an added_at round-trip test, switch hard-coded `1` assertions
  to SCHEMA_VERSION for future-proofing.

No v1 sidecars in the wild yet — no migration code needed.
2026-05-25 19:47:25 +02:00
francwa e65c1df229 feat(.alfred v2 — Phase 2): Pydantic sidecars, atomic repos, auto-heal index
Spec: specs/dot_alfred_v2.md (Phase 2).

New package alfred/infrastructure/persistence/dot_alfred/v2/:
  * sidecar_release.py / sidecar_root.py — Pydantic DTOs
    (extra="forbid", frozen=True) for per-item sidecars and the
    library-root index. schema_version enforced via model_validator.
  * serializer.py — read_yaml / atomic_write_yaml (.tmp + os.replace).
    SidecarSchemaError wraps YAML + Pydantic errors uniformly.
  * bridge.py — lossless domain <-> sidecar for SeriesRelease /
    MovieRelease; projection-only show_index_entry_from /
    movie_index_entry_from with multi-episode-file flattening.
  * repository.py — DotAlfredSeriesReleaseRepository /
    DotAlfredMovieReleaseRepository (log+skip on corruption),
    DotAlfredTVShowLibraryIndex / DotAlfredMovieLibraryIndex with
    silent auto-heal on missing/corrupt index reads. Writes never
    auto-heal (read paths handle that).

TMDB client extensions:
  * TmdbSeasonInfo / TmdbShowInfo DTOs + pure parse_tv_show_info.
  * TMDBClient.get_tv_show_info aggregates /tv/{id} +
    /tv/{id}/external_ids.

Domain change:
  * SubtitleTrack gains is_sdh: bool = False, populated from
    ffprobe's hearing_impaired disposition. Required for v2 sidecar
    parity (spec replaces v1's type: "sdh" with explicit flag).
    Default keeps every existing caller unchanged.

Tests: 37 new v2 integration tests on tmp_path (round-trips, atomic
writes, schema mismatch handling, anchor warnings, auto-heal paths)
plus 16 TMDB DTO tests. Full suite: 1240 -> 1277 passed.

Implementation notes filed in .claude/specs/dot_alfred_v2_notes.md
(strict=True trade-off, upsert signature deviation from spec, etc.).

Phases 3-5 (TVShow/Movie refactor to TMDB-only, rescan_show rewrite,
v1 deletion + wiring) are next.
2026-05-25 16:01:39 +02:00
francwa 3622c95154 chore(lint): Lint the shit out of it 2026-05-24 15:21:58 +02:00
francwa c7c11180d9 feat(persistence): add DotAlfredTVShowRepository (filesystem-backed)
Step 3 of specs/dot_alfred.md. Concrete TVShowRepository
implementation reading and writing per-show .alfred YAML files under
a configurable library_root. Writes are atomic (.alfred.tmp +
os.replace), reads tolerate corrupted/wrong-schema sidecars (log +
skip), and the repo never invents a folder name — save(show)
requires the target folder to exist beforehand (raises
ShowFolderUnknown otherwise), matching the spec's
MediaOrganizer-then-sidecar split.

Cold folders without a sidecar are skipped by find_all and yield
None from find_by_imdb_id — the upcoming rescan_show tool (step 4)
will own the opt-in rebuild path.

A small bridge module translates between the rich domain TVShow
(AudioTrack/SubtitleTrack with full ffprobe minutiae) and the
compact sidecar shape (language-only audio, embedded-only subs with
type derived from is_forced). The bridge is intentionally lossy on
probe details the sidecar does not store, per the spec's
factual-only philosophy.

20 integration tests on tmp_path: round-trip save/find,
cold-folder/unknown-id returns, find_all skipping
(corrupted/schema-violating sidecars), delete/exists, atomic write
(no .alfred.tmp leftover), overwrite, and folder-name fallbacks
(get_folder_name guess + full-scan rescue when renamed).
2026-05-22 17:16:41 +02:00
francwa b0e275bd11 feat(persistence): add .alfred sidecar serializer (DTO ↔ dict)
Step 2 of the specs/dot_alfred.md plan. Pure-dict in/out
(serialize(sidecar) -> dict, deserialize(data) -> ShowSidecar);
YAML I/O lives in the repository layer (step 3) and is kept out
for trivial testability.

DTOs mirror the YAML schema field-for-field:
- ShowSidecar (root: imdb_id, tmdb_id, schema_version, seasons)
- SeasonSidecar (number, path, optional audio/subtitles, optional episodes)
- EpisodeSidecar (number, path, optional audio/subtitles)
- SubtitleEntry (language, source, type)

The sidecar acts as a scan cache: it stores only what is genuinely
costly to recompute — folder/file paths (skipping the FS walk) and
probed track metadata (skipping ffprobe). Release identifiers
(group, source, quality, codec) live in folder/file names and are
derived on demand by the parser; they are deliberately absent from
the schema and rejected as unknown keys on deserialize.

The serializer is strict on schema: unknown keys at any level raise
SidecarSchemaError, missing required fields raise clearly, and bool
cannot sneak in as a season/episode number. Optional fields
(tmdb_id, empty audio/subtitles/episodes) are omitted from the
output rather than emitted as null / [].

Tests cover round-trip equivalence (DTO → dict → DTO and DTO → YAML
text → DTO), the Foundation S01 PACK case (real-world fixture with
mixed sub types — superset captured at season scope), and a
Breaking Bad S05 EPISODIC case. An on-disk tmp_path fixture
recreates the Foundation folder structure with placeholder files,
ready to be reused by the upcoming repository walk tests in step 3.
2026-05-22 16:56:56 +02:00
francwa 6c12c18a27 refactor(tv_shows): freeze aggregate, builder-only construction, drop ShowTracker fields
The TVShow aggregate is now fully immutable. TVShow, Season and Episode
are @dataclass(frozen=True), children stored as ordered tuples sorted
by number. All construction goes through TVShowBuilder / SeasonBuilder
(new module), which expose from_existing() to seed from a current
frozen aggregate and apply modifications.

ShowTracker-territory fields are stripped from the domain: ShowStatus,
CollectionStatus, expected_seasons/episodes, aired_episodes,
collection_status(), is_complete_series(), missing_episodes(),
is_ongoing(), is_ended(), Season.name, the aired<=expected validation,
and the TMDB status string mapping. These will reappear in a dedicated
ShowTracker layer (to be designed) combining the .alfred sidecar with
live TMDB data.

New SeasonMode enum (PACK / EPISODIC) computed at read time from the
season's structural shape — never stored, the YAML sidecar encodes the
mode via presence/absence of the episodes: block.

Test suite for the domain entirely rewritten to cover frozen invariants,
builder ordering, last-write-wins, from_existing round-trip, and
SeasonMode derivation. Full suite still green (1078 passed).
2026-05-22 16:09:37 +02:00
francwa 88f156b7a4 refactor(subtitles): rename SubtitleCandidate → SubtitleScanResult
The old name conflated 'might become a placed subtitle' with 'what a
scan pass produced'. The class is the output of a scan/identify pass —
language/format may still be None while classification is in progress,
confidence reflects classifier certainty, raw_tokens holds filename
fragments under analysis. SubtitleScanResult says that directly.

Pure rename + refreshed docstring; no behavior change. Touches the
domain entity, the matcher/identifier/utils services, the
manage_subtitles use case, the placer, the metadata store, the
shared-media cross-ref comment, and 7 test modules.
2026-05-21 08:05:46 +02:00
francwa 18267d0165 refactor(language): LanguageRepository port + SubtitleKnowledgeBase wired to it
Mirror the MediaProber / FilesystemScanner pattern for language lookup:

- New Protocol `LanguageRepository` in alfred.domain.shared.ports
  covering from_iso, from_any, all, __contains__, __len__ — the
  surface previously coupled to the concrete LanguageRegistry.
- SubtitleKnowledgeBase types its `language_registry` parameter
  against the Protocol; the concrete LanguageRegistry stays in
  infrastructure as the YAML-backed adapter and remains the default
  when no repository is injected.
- New unit tests in tests/infrastructure/test_language_registry.py
  cover the adapter surface (from_iso, from_any, membership,
  case-insensitivity, non-string inputs).

Behaviour is unchanged for existing callers. The split opens the
door to in-memory fakes in future tests without loading the full
ISO 639 YAML.
2026-05-20 23:18:25 +02:00
francwa c303efea48 refactor(probe): consolidate full probe() into MediaProber port
Add probe(video) -> MediaInfo | None to the MediaProber Protocol and
implement it on FfprobeMediaProber. The standalone
alfred/infrastructure/filesystem/ffprobe.py module is removed; all
callers (analyze_release / probe_media tools, testing scripts) now go
through the adapter.

Tests for the probe path moved to tests/infrastructure/test_ffprobe_prober.py
(patching subprocess.run at the adapter module level).

Unblocks the upcoming inspect_release orchestrator, which needs the
port — not a free function — to compose parse + main-video selection
+ probe in one shot.
2026-05-20 09:11:24 +02:00
francwa 6802933acd test(release): adapt suite to explicit ReleaseKnowledge injection
- test_release.py / test_release_fixtures.py: module-level
  _KB = YamlReleaseKnowledge() + thin _parse(name) helper threading it
  into parse_release. test_show_folder_name_strips_windows_chars renamed
  to test_show_folder_name_uses_already_safe_title to reflect the
  Option B contract (caller sanitizes via kb.sanitize_for_fs).
- test_detect_media_type.py: same _KB pattern, all
  detect_media_type(parsed, path) calls now pass kb.
- test_filesystem_extras.py: find_video_file(path) calls now pass kb.
- test_enrich_from_probe.py: _bare() helper adds the new
  title_sanitized field.
- test_resolve_destination.py: drop _sanitize import + TestSanitize
  class (helper deleted), add tmdb_title_safe arg to
  _resolve_series_folder calls.

987 passed, 8 skipped.
2026-05-19 22:05:26 +02:00
francwa 6e252d1e81 refactor(subtitles): inject default rules into SubtitleRuleSet.resolve()
aggregates.py used to call SubtitleKnowledgeBase().default_rules() via a
DEFAULT_RULES() helper, which silently pulled the infrastructure layer
(YAML loader) into the domain on every resolve.

Make the dependency explicit: resolve() now takes the default rules as
a parameter, and the caller (the ManageSubtitles use case) loads them
from the KB once and passes them in. Domain stays I/O-free.

- Drop DEFAULT_RULES helper and the SubtitleKnowledgeBase import from
  alfred/domain/subtitles/aggregates.py
- SubtitleRuleSet.resolve(default_rules: SubtitleMatchingRules)
- manage_subtitles use case passes kb.default_rules() at the call site
- Tests use a local SubtitleMatchingRules stand-in instead of relying
  on KB defaults
2026-05-19 15:10:06 +02:00
francwa 903e9e7117 refactor(subtitles): move SubtitlePlacer to application layer
The placer performs filesystem I/O (os.link) — it belongs in the
application layer, not the domain. Domain services should be pure.

- Move alfred/domain/subtitles/services/placer.py to
  alfred/application/subtitles/placer.py
- Move tests/domain/test_subtitle_placer.py to
  tests/application/test_subtitle_placer.py
- Update all callers (manage_subtitles use case, metadata store, tests)
- Drop placer re-exports from domain.subtitles.services.__init__
2026-05-19 15:07:39 +02:00
francwa 891ba502a2 chore: apply pre-commit auto-fixes (trim trailing whitespace, EOF) 2026-05-17 23:41:54 +02:00
francwa e07c9ec77b chore: sprint cleanup — language unification, parser unification, fossils removal
Several weeks of work accumulated without being committed. Grouped here for
clarity; see CHANGELOG.md [Unreleased] for the user-facing summary.

Highlights
----------

P1 #2 — ISO 639-2/B canonical migration
- New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/).
- iso_languages.yaml as single source of truth for language codes.
- SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml
  only declares subtitle-specific tokens (vostfr, vf, vff, …).
- SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as
  {iso639_2b}.srt (legacy fr.srt still read via alias).
- Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS /
  SUBTITLE_EXTENSIONS hardcoded dicts.
- Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias).
- Added settings.min_movie_size_bytes (was a module constant).

P1 #3 — Release parser unification + data-driven tokenizer
- parse_release() is now the single source of truth for release-name parsing.
- alfred/knowledge/release/separators.yaml declares the token separators used
  by the tokenizer (., space, [, ], (, ), _). New conventions can be added
  without code changes.
- Tokenizer now splits on any configured separator instead of name.split('.').
  Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via
  the direct path without sanitization fallback.
- Site-tag extraction always runs first; well-formedness only rejects truly
  forbidden chars.
- _parse_season_episode() extended with NxNN / NxNNxNN alt forms.
- Removed dead helpers: _sanitize, _normalize.

Domain cleanup
- Deleted fossil services with zero production callers:
    alfred/domain/movies/services.py
    alfred/domain/tv_shows/services.py
    alfred/domain/subtitles/services.py (replaced by subtitles/services/ package)
    alfred/domain/subtitles/repositories.py
- Split monolithic subtitle services into a package (identifier, matcher,
  placer, pattern_detector, utils) + dedicated knowledge/ package.
- MediaInfo split into dedicated package (alfred/domain/shared/media/:
  audio, video, subtitle, info, matching).

Persistence cleanup
- Removed dead JSON repositories (movie/subtitle/tvshow_repository.py).

Tests
- Major expansion of the test suite organized to mirror the source tree.
- Removed obsolete *_edge_cases test files superseded by structured tests.
- Suite: 990 passed, 8 skipped.

Misc
- .gitignore: exclude env_backup/ and *.bak.
- Adjustments across agent/llm, app.py, application/filesystem, and
  infrastructure/filesystem to align with the new domain layout.
2026-05-17 23:38:00 +02:00
francwa e45465d52d feat: split resolve_destination, persona-driven prompts, qBittorrent relocation
Destination resolution
- Replace the single ResolveDestinationUseCase with four dedicated
  functions, one per release type:
    resolve_season_destination    (pack season, folder move)
    resolve_episode_destination   (single episode, file move)
    resolve_movie_destination     (movie, file move)
    resolve_series_destination    (multi-season pack, folder move)
- Each returns a dedicated DTO carrying only the fields relevant to
  that release type — no more polymorphic ResolvedDestination with
  half the fields unused depending on the case.
- Looser series folder matching: exact computed-name match is reused
  silently; any deviation (different group, multiple candidates) now
  prompts the user with all options including the computed name.

Agent tools
- Four new tools wrapping the use cases above; old resolve_destination
  removed from the registry.
- New move_to_destination tool: create_folder + move, chained — used
  after a resolve_* call to perform the actual relocation.
- Low-level filesystem_operations module (create_folder, move via mv)
  for instant same-FS renames (ZFS).

Prompt & persona
- New PromptBuilder (alfred/agent/prompt.py) replacing prompts.py:
  identity + personality block, situational expressions, memory
  schema, episodic/STM/config context, tool catalogue.
- Per-user expression system: knowledge/users/common.yaml +
  {username}.yaml are merged at runtime; one phrase per situation
  (greeting/success/error/...) is sampled into the system prompt.

qBittorrent integration
- Credentials now come from settings (qbittorrent_url/username/password)
  instead of hardcoded defaults.
- New client methods: find_by_name, set_location, recheck — the trio
  needed to update a torrent's save path and re-verify after a move.
- Host→container path translation settings (qbittorrent_host_path /
  qbittorrent_container_path) for docker-mounted setups.

Subtitles
- Identifier: strip parenthesized qualifiers (simplified, brazil…) at
  tokenization; new _tokenize_suffix used for the episode_subfolder
  pattern so episode-stem tokens no longer pollute language detection.
- Placer: extract _build_dest_name so it can be reused by the new
  dry_run path in ManageSubtitlesUseCase.
- Knowledge: add yue, ell, ind, msa, rus, vie, heb, tam, tel, tha,
  hin, ukr; add 'fre' to fra; add 'simplified'/'traditional' to zho.

Misc
- LTM workspace: add 'trash' folder slot.
- Default LLM provider switched to deepseek.
- testing/debug_release.py: CLI to parse a release, hit TMDB, and
  dry-run the destination resolution end-to-end.
2026-05-14 05:01:59 +02:00
francwa 249c5de76a feat: major architectural refactor
- Refactor memory system (episodic/STM/LTM with components)
- Implement complete subtitle domain (scanner, matcher, placer)
- Add YAML workflow infrastructure
- Externalize knowledge base (patterns, release groups)
- Add comprehensive testing suite
- Create manual testing CLIs
2026-05-11 21:55:06 +02:00