alfred

Author	SHA1	Message	Date
francwa	ed7680b58f	docs(changelog): log parse-confidence scoring + ParseReport tuple	2026-05-20 01:21:47 +02:00
francwa	629387591f	docs(changelog): freeze release parser v2 work block (2026-05-20)	2026-05-20 01:08:17 +02:00
francwa	230a7ab88a	docs(changelog): log SHITTY simplification + distributor split	2026-05-20 01:03:52 +02:00
francwa	7dc7f0c241	feat(release): v2 enricher pass for audio/video-meta/edition/language The EASY pipeline now extracts the full ParsedRelease surface from known-group releases, not just the structural backbone. Behavior is unchanged for releases that don't carry these tokens. Pipeline (parser/pipeline.py): - Structural walk (renamed _annotate_structural): no longer requires body to be fully consumed. Tokens passed over between schema chunks remain UNKNOWN so the enricher pass can claim them. - _find_chunk(): scans forward in the body for the next token matching a given role, skipping already-annotated tokens. Lets optional and mandatory chunks both tolerate intercalated enricher tokens. - _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION / LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta / kb.editions are matched first (longest-first ordering preserved from the YAML), single tokens after. - _apply_sequences(): mutates the token list, tagging the first token of a matched sequence with extra['sequence']=<canonical value> and trailing members with extra['sequence_member']='True' so assemble skips them. - _detect_channel_pairs(): handles the '5.1' / '7.1' case where the '.' separator splits the layout into two tokens. Strips a trailing '-GROUP' suffix on the second before joining. Assemble: - New fields populated: languages (list), audio_codec, audio_channels, bit_depth, hdr_format, edition. Each role-handler skips sequence_member tokens. - media_type heuristic extended: edition in {COMPLETE, INTEGRALE, COLLECTION} + no season → tv_complete (mirrors legacy). Tests: - 4 new TestEnrichers cases covering bit_depth+audio_codec+channels, HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language with DTS-HD.MA sequence, TV episode with single language. - All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed, 8 skipped. Refs: project_release_parser_v2_specs (memory)	2026-05-20 00:26:05 +02:00
francwa	075a827b0e	feat(release): wire v2 EASY path for known release groups The annotate-based v2 pipeline now handles releases ending in -KONTRAST, -ELiTE, or -RARBG. Unknown groups still fall through to the legacy SHITTY heuristic in services.py — nothing changes for them. Pipeline (alfred/domain/release/parser/pipeline.py): - tokenize(): string-ops separator split, strips [site.tag] first. - annotate(): right-to-left group detection (priority to codec-GROUP shape, fallback to any non-source dashed token), GroupSchema lookup via the kb port, then lockstep walk of tokens against schema chunks. Optional chunks skip on mismatch, mandatory mismatches return None so the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP trailing token correctly skips the CODEC chunk in the body walk. - assemble(): folds annotated tokens into a ParsedRelease-compatible dict (title joined by '.', group from the codec-GROUP token's extras). Schema (alfred/domain/release/parser/schema.py): - GroupSchema + SchemaChunk frozen value objects. - TokenRole.GROUP added. Port + adapter: - ReleaseKnowledge.group_schema(name) lookup added (case-insensitive). - YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/ *.yaml at construction time; learned overrides in data/knowledge/release/release_groups/ also picked up. Knowledge: - release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the canonical chunk_order. ELiTE marks source as optional (Foundation.S02 has no WEBRip token). Services: - parse_release tries the v2 path first; on None falls through to the legacy implementation untouched. Tests: - tests/domain/release/test_parser_v2_easy.py (10 cases) cover group detection (codec-GROUP, dashed-source skip, no-dash → unknown), schema-driven annotation (movie, TV episode, season pack with optional source, unknown group returns None), and field assembly. - Existing tests/domain/test_release_fixtures.py (30 cases) stay green: 5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures still produced by the legacy path. Verified via spy on v2.assemble. Suite: 1007 passed, 8 skipped. Refs: project_release_parser_v2_specs (memory)	2026-05-20 00:21:11 +02:00
francwa	a2c917618f	feat(release): scaffold v2 parser package (annotate-based pipeline) New package alfred/domain/release/parser/ lays the foundation for the release parser refactor (specs in memory). Exposes: - Token: frozen VO carrying text + stream index + TokenRole + extra dict. with_role() returns a new instance (no mutation). - TokenRole: str-backed enum split into structural (TITLE/YEAR/SEASON_EP/ GROUP), technical (RESOLUTION/SOURCE/CODEC/AUDIO_*/BIT_DEPTH/HDR/ EDITION/LANGUAGE), and meta (SITE_TAG/UNKNOWN) families. - pipeline.strip_site_tag(): pulls a [site.tag] prefix or suffix. - pipeline.tokenize(): release name -> list[Token] (all UNKNOWN), string-ops split on kb.separators (no regex, per CLAUDE.md). - pipeline.annotate(): documented stub. Walk order recorded in docstring (group right-to-left, then season/episode, year, tech, title). Legacy parse_release in release.services remains the live implementation until the annotate step lands. Scaffolding tests verify Token API, site-tag stripping (prefix/suffix), and tokenize output shape. Refs: project_release_parser_v2_specs (memory)	2026-05-20 00:12:33 +02:00
francwa	cd814c7922	docs(changelog): log refactor/domain-release-knowledge work block	2026-05-19 22:05:29 +02:00
francwa	df798f55cc	refactor(subtitles): introduce SubtitleKnowledge Protocol port Domain services (SubtitleIdentifier, PatternDetector) used to import the concrete SubtitleKnowledgeBase class directly from infrastructure for their type hint. With this commit they depend on a structural Protocol in alfred/domain/subtitles/ports/knowledge.py declaring just the 7 read-only query methods the domain actually consumes. The concrete YAML-backed SubtitleKnowledgeBase in infrastructure remains the sole adapter — no rename, no shim. With this change alfred/domain/subtitles/ has zero imports from alfred/infrastructure/. Also extend the changelog entry covering the full domain-io-extraction branch.	2026-05-19 15:15:43 +02:00
francwa	535935cc73	docs(changelog): summarize refactor/domain-io-extraction work block	2026-05-19 15:11:17 +02:00
francwa	f6eef59fca	refactor: tech debt mini-pass (items 5, 6, 7, 20) Low-risk cleanup items, no functional change to the parser. The philosophy remains: keep the parser simple, the AI handles edge cases. - Extract duplicated 'fs-safe title → dot-folder-name' regex into to_dot_folder_name() in domain/shared/value_objects.py. Used by both MovieTitle.normalized() and TVShow.get_folder_name() (item #5). - ParsedRelease.languages now uses field(default_factory=list) instead of a manual __post_init__ assigning [] via object.__setattr__ (#6). - tv_shows/entities.py module docstring: prepend ASCII ownership tree for quicker visual scan of the aggregate hierarchy (#7). - file_extensions.yaml: split subtitle sidecars (.srt/.sub/.idx/.ass/.ssa) into a dedicated 'subtitle:' category instead of lumping them under 'metadata:'. _METADATA_EXTENSIONS at the value_objects.py level remains the union of both — detect_media_type behavior unchanged. New loader load_subtitle_extensions() exposes the distinct subtitle set for future callers in the subtitles domain (#20). Suite: 1020 passed, 8 skipped.	2026-05-18 16:24:28 +02:00
francwa	273510dff8	test(fixtures): seed PATH OF PAIN bucket with 10 worst-case fixtures 10 pathological release names mined from the real downloads folder. Each fixture locks in the current parse_release output (including its silent losses and false positives) so future parser improvements are intentional, not silent drift. Cases: - Khruangbin yt-dlp slug (UTF-8 wide pipe '｜', YT ID as group) - Deutschland 83-86-89 franchise box (group=S03 misdetection) - Chérie Le BéBé (accented chars preserved, VFF language) - Jimmy Carr 8-word stand-up special title - [ OxTorrent.vc ] prefix + XviD codec (site_tag prefix) - Prodiges S12E01 with episode title + air-date silently lost - The Prodigy: apostrophe + Blu-ray dash + 1080i + multi-word audio = full AI-path degeneration (everything UNKNOWN) - Sleaford Mods yt-dlp slug (YT ID glued to year) - Super Mario Bros [FR-EN] (bilingual tag mistaken for group) - Gilmore Girls Complete S01-S07 (the well-behaved exception: COMPLETE token correctly drives tv_complete + REPACK + 10bit) Also adds shitty + path_of_pain to the per-bucket sanity assertion. Suite: 1020 passed, 8 skipped.	2026-05-18 15:57:56 +02:00
francwa	c1831e3f46	test(fixtures): drop derry_duplicate_naming (was a copy-paste artifact) The release name mixed two distinct releases — not a real-world case worth anti-regression. SHITTY bucket now holds 14 fixtures (down from 15).	2026-05-18 15:51:11 +02:00
francwa	aa182458b8	test(fixtures): seed SHITTY release bucket with 15 anti-regression cases Add 15 expected.yaml fixtures under tests/fixtures/releases/shitty/ covering the awkward but real-world release names from the downloads folder. Each fixture locks in the current parse_release behavior so future parser changes are intentional, not silent drift. Cases captured: - Angel INTEGRALE 3-level hierarchy (tv_complete media_type) - Buffy custom French title with dots preserved - Archer S14E09E10E11 multi-episode (E11 lost — tech debt) - Notre Planète lowercase s01e01 - Vinyl ' - 1x01 - FHD' (stray dash artifact — tech debt) - Deutschland.83 (year-suffix as part of title) - Tatortreiniger S01-06 range (falls to movie — tech debt) - Derry Girls duplicated title - Jurassic Park bare folder (media_type=unknown) - La Nuit au Musée bilingual MULTI - Chérie j'ai agrandi (ASCII-stripped apostrophe, parses fine) - Honey Don't (unescaped apostrophe — full AI-path degeneration) - Hook MULTi.SUBS movie with Subs/ folder - Predator Badlands space separators (group=UNKNOWN — tech debt) - Westworld S04 Subs.Only (no video file) Each fixture also captures the future 3-flow routing (library / torrents / seed_hardlinks) ahead of the organize_media refactor. Suite: 1011 passed, 8 skipped.	2026-05-18 15:48:41 +02:00
francwa	7bc50fd5b8	test: add real-world release fixtures (EASY bucket) Captures 5 canonical releases from /mnt/testipool/downloads as parametrized fixtures under tests/fixtures/releases/easy/. Each fixture declares the release name, expected ParsedRelease fields, original tree, and the future routing (library / torrents / seed_hardlinks) for the upcoming organize_media refactor. Today only the 'parsed' section is asserted; tree is materialized into a tmp_path to catch typos. Routing is captured ahead of the planner work — it becomes verifiable once organize_media lands. Cases: back_in_action (movie), slow_horses_single_ep (TV single), foundation_season_pack (S02 + .nfo noise), long_walk_with_noise (movie + KONTRAST.TOP.txt), sinners_yts (YTS bracket-heavy + Subs/ dir). Also tracks CHANGELOG.md under [Unreleased] / Added.	2026-05-18 15:36:19 +02:00
francwa	6940c76e58	Updated README and did a little bit of cleanup	2025-12-09 04:24:16 +01:00
francwa	9ca31e45e0	feat!: migrate to OpenAI native tool calls and fix circular deps (#fuck-gemini) - Fix circular dependencies in agent/tools - Migrate from custom JSON to OpenAI tool calls format - Add async streaming (step_stream, complete_stream) - Simplify prompt system and remove token counting - Add 5 new API endpoints (/health, /v1/models, /api/memory/*) - Add 3 new tools (get_torrent_by_index, add_torrent_by_index, set_language) - Fix all 500 tests and add coverage config (80% threshold) - Add comprehensive docs (README, pytest guide) BREAKING: LLM interface changed, memory injection via get_memory()	2025-12-06 19:11:05 +01:00

16 Commits