Know it’s working
Most context files are hope in a text box. You add a rule, and you have no way to tell whether the model followed it, ignored it, or never reached it. MemMini’s first distinctive feature is the answer to “is this rule actually doing anything?”
A rule binds iff it’s measurable
Section titled “A rule binds iff it’s measurable”The principle: a rule binds if and only if its compliance can be measured. If you can’t write a check that could fail when the rule is broken, you don’t have a rule — you have a wish.
So each rule worth enforcing gets a mechanical tripwire that emits PASS or FAIL to a
rolling scoreboard. “Did this rule change what the agent does?” becomes a test that
can fail — not a claim that the text exists in CONTEXT.md. That’s the difference
between self-improving and self-congratulatory.
Why “the text exists” isn’t enough
Section titled “Why “the text exists” isn’t enough”A line can sit in your compiled context and still never fire — buried in the weak-attention middle, outcompeted by task pressure, or written as an aspiration the model can’t detect a moment to apply. Checking that the line is present proves nothing about behavior. Checking that the behavior changed — and that the check reverts to FAIL when you undo the rule — proves it works.
This is the same completion discipline MemMini applies everywhere: work is done only when the receiver confirms it. For a rule, the receiver is a test that exercises a real trigger, not a re-read of the file.
How it fits the loop
Section titled “How it fits the loop”Rule-binding evals are what let MemMini improve itself safely. Every change must move a score on the scoreboard, and the pass that edits a rule may not edit the score that judges it — otherwise the system would just rewrite its own definition of “better” and declare victory. Held-out rewards are why the improvement loop converges instead of collapsing.
Go deeper
Section titled “Go deeper”- Canonical spec — eval-case schema, scoreboard format:
wiki/contracts/behavioral-eval.md - The five-step loop it powers:
skills/distill/SKILL.md - What “received” means: Completion is visibility