Files
alfred/alfred/knowledge/release/scoring.yaml
T
francwa 98c688f29b feat(release): foundations for parse-confidence scoring
Add the building blocks for Phase A scoring without yet wiring them
into parse_release. Nothing changes at runtime — parse_release still
returns a single ParsedRelease — but the pieces needed to upgrade it
in a follow-up commit are now in place.

- alfred/knowledge/release/scoring.yaml: weights / penalties /
  thresholds. Title and media_type are heavy (30 / 20), structural
  fields medium (year 15, season 10), tech fields light (5 each).
  Unknown-token penalty 5 capped at -30. SHITTY/PoP cutoff at 60.
- load_scoring() loader with safe defaults baked in: a missing or
  partial YAML only de-tunes, never breaks.
- ReleaseKnowledge port grows a 'scoring: dict' field. YamlReleaseKnowledge
  populates it from load_scoring().
- New parser/scoring.py module with Road enum (EASY / SHITTY /
  PATH_OF_PAIN, distinct from ParsePath which records the tokenization
  route), and pure functions: compute_score, decide_road,
  collect_unknown_tokens, collect_missing_critical.
- ParseReport frozen VO in value_objects.py — exported alongside
  ParsedRelease.
2026-05-20 01:21:17 +02:00

43 lines
1.5 KiB
YAML

# Release parse scoring.
#
# `parse_release` returns a `ParseReport` alongside the `ParsedRelease`.
# The report carries a 0-100 confidence score computed from the annotated
# tokens, plus the road decision (EASY / SHITTY / PATH_OF_PAIN).
#
# Why YAML: the weights and the SHITTY/PoP cutoff are tuning knobs we
# expect to iterate on as fixtures grow. Keeping them in code would
# mean a commit per tweak; here the user can adjust without touching
# Python.
#
# Weights are awarded when the corresponding ParsedRelease field is
# populated (non-None, non-"UNKNOWN" for group). Season and episode
# only contribute when the parse looks like TV (season is not None).
weights:
title: 30 # structural pivot — without it nothing else matters
media_type: 20 # movie / tv_show / tv_complete / …
year: 15
season: 10 # only counted for TV-shaped releases
episode: 5
resolution: 5
source: 5
codec: 5
group: 5 # "UNKNOWN" yields 0
# Penalty applied per UNKNOWN token left in the annotated stream.
# Capped at `max_unknown_penalty` to keep a long-tail of garbage from
# pushing every release into PoP.
penalties:
unknown_token: 5
max_unknown_penalty: 30
# Decision thresholds.
#
# EASY is decided structurally (a known group schema matched) — it does
# not look at the score. SHITTY vs PATH_OF_PAIN is decided here:
#
# score >= shitty_min → SHITTY (best-effort parse usable)
# score < shitty_min → PATH_OF_PAIN (needs user / LLM help)
thresholds:
shitty_min: 60