Know it’s working

Most context files are hope in a text box. You add a rule, and you have no way to tell whether the model followed it, ignored it, or never reached it. MemMini’s first distinctive feature is the answer to “is this rule actually doing anything?”

A rule binds iff it’s measurable

The principle: a rule binds if and only if its compliance can be measured. If you can’t write a check that could fail when the rule is broken, you don’t have a rule — you have a wish.

So each rule worth enforcing gets a mechanical tripwire that emits PASS or FAIL to a rolling scoreboard. “Did this rule change what the agent does?” becomes a test that can fail — not a claim that the text exists in CONTEXT.md. That’s the difference between self-improving and self-congratulatory.

Why “the text exists” isn’t enough

A line can sit in your compiled context and still never fire — buried in the weak-attention middle, outcompeted by task pressure, or written as an aspiration the model can’t detect a moment to apply. Checking that the line is present proves nothing about behavior. Checking that the behavior changed — and that the check reverts to FAIL when you undo the rule — proves it works.

This is the same completion discipline MemMini applies everywhere: work is done only when the receiver confirms it. For a rule, the receiver is a test that exercises a real trigger, not a re-read of the file.

How it fits the loop

Rule-binding evals are what let MemMini improve itself safely. Every change must move a score on the scoreboard, and the pass that edits a rule may not edit the score that judges it — otherwise the system would just rewrite its own definition of “better” and declare victory. Held-out rewards are why the improvement loop converges instead of collapsing.

Go deeper

Canonical spec — eval-case schema, scoreboard format: wiki/contracts/behavioral-eval.md
The five-step loop it powers: skills/distill/SKILL.md
What “received” means: Completion is visibility