about
what this is
Internal sandbox for the criterion-probe. The probe is a frozen-backbone (Qwen-2.5-3B-Instruct) + 2048-dim LR head that returns a probability that a single English criterion sentence is satisfied by a piece of content. Reads pragmatic register at the ASSESSMENT seed position; doesn't reason.
how to use it
- playground — one criterion + one content, single score.
- batch — up to 64 (criterion, content) pairs in one call.
what it does well / badly
- Well: topical, lexical, sentiment, multi-clause AND/OR, deontic, counterfactual. AUCs typically 0.9+.
- Badly: negation (anti-correlated on Qwen-2.5-3B, see SETTLED_FINDINGS §4.1), arithmetic / counting, presupposition vs assertion, abstract quality judgments ("comprehensive", "helpful").
- Anti-correct on compound vibe criteria: a multi-axis criterion like "comprehensive AND helpful AND clear AND child-accessible" ranks the WORST items highest. If you want quality assessment, decompose into atoms or use an LM judge.
upstream status
checking…
links
criterion-proberepo: github.com/nope-net/criterion-probe- SETTLED_FINDINGS — full capability ladder + caveats (in the repo)
- SWEEP_RESULTS — base-model sweep result table (in the repo)