98c688f29b
Add the building blocks for Phase A scoring without yet wiring them into parse_release. Nothing changes at runtime — parse_release still returns a single ParsedRelease — but the pieces needed to upgrade it in a follow-up commit are now in place. - alfred/knowledge/release/scoring.yaml: weights / penalties / thresholds. Title and media_type are heavy (30 / 20), structural fields medium (year 15, season 10), tech fields light (5 each). Unknown-token penalty 5 capped at -30. SHITTY/PoP cutoff at 60. - load_scoring() loader with safe defaults baked in: a missing or partial YAML only de-tunes, never breaks. - ReleaseKnowledge port grows a 'scoring: dict' field. YamlReleaseKnowledge populates it from load_scoring(). - New parser/scoring.py module with Road enum (EASY / SHITTY / PATH_OF_PAIN, distinct from ParsePath which records the tokenization route), and pure functions: compute_score, decide_road, collect_unknown_tokens, collect_missing_critical. - ParseReport frozen VO in value_objects.py — exported alongside ParsedRelease.
43 lines
1.5 KiB
YAML
43 lines
1.5 KiB
YAML
# Release parse scoring.
|
|
#
|
|
# `parse_release` returns a `ParseReport` alongside the `ParsedRelease`.
|
|
# The report carries a 0-100 confidence score computed from the annotated
|
|
# tokens, plus the road decision (EASY / SHITTY / PATH_OF_PAIN).
|
|
#
|
|
# Why YAML: the weights and the SHITTY/PoP cutoff are tuning knobs we
|
|
# expect to iterate on as fixtures grow. Keeping them in code would
|
|
# mean a commit per tweak; here the user can adjust without touching
|
|
# Python.
|
|
#
|
|
# Weights are awarded when the corresponding ParsedRelease field is
|
|
# populated (non-None, non-"UNKNOWN" for group). Season and episode
|
|
# only contribute when the parse looks like TV (season is not None).
|
|
|
|
weights:
|
|
title: 30 # structural pivot — without it nothing else matters
|
|
media_type: 20 # movie / tv_show / tv_complete / …
|
|
year: 15
|
|
season: 10 # only counted for TV-shaped releases
|
|
episode: 5
|
|
resolution: 5
|
|
source: 5
|
|
codec: 5
|
|
group: 5 # "UNKNOWN" yields 0
|
|
|
|
# Penalty applied per UNKNOWN token left in the annotated stream.
|
|
# Capped at `max_unknown_penalty` to keep a long-tail of garbage from
|
|
# pushing every release into PoP.
|
|
penalties:
|
|
unknown_token: 5
|
|
max_unknown_penalty: 30
|
|
|
|
# Decision thresholds.
|
|
#
|
|
# EASY is decided structurally (a known group schema matched) — it does
|
|
# not look at the score. SHITTY vs PATH_OF_PAIN is decided here:
|
|
#
|
|
# score >= shitty_min → SHITTY (best-effort parse usable)
|
|
# score < shitty_min → PATH_OF_PAIN (needs user / LLM help)
|
|
thresholds:
|
|
shitty_min: 60
|