Perspective

The world is bigger than the model.

Every model ships frozen on the day it trained; the environment it works in does not hold still. Why the gap between a trained snapshot and a moving world is the real case for continual learning — and the deepest version of the argument comes from the person who wrote the book on reinforcement learning.

Zak Data Solutions · June 22, 2026

The case against a trained-once model is usually made on cost or recency: retraining is expensive, the data goes stale. Both are real, and both are symptoms. The deeper reason a frozen model falls behind your business is structural, and it has a name. Richard Sutton — who wrote the field's standard textbook on reinforcement learning — calls it the Big World Hypothesis.

The world is orders of magnitude larger than the agent and thus is perceived as ever-changing.
— Banafsheh Rafiee & Richard S. Sutton, Toward Enactive AI (2026)

Unpack that. Any model, however large, is a finite thing trained on a finite slice of the past. The environment it has to operate in — your data, your systems, your customers, the regulations, the decisions made this morning — is vastly larger than that slice, and it never stops moving. So from the model's point of view, the world is always changing out from under it. You do not fix that by training a bigger model on more of the past, because the problem was never that the snapshot was small. The problem is that it is a snapshot. The only structural answer is a system that keeps interacting and keeps learning. Continual learning is not a feature you bolt on; it is the direct consequence of the world being bigger than any model of it.

Continuing a pattern is not handling the exception

There is a sharper way to feel why this matters in practice. A model trained to imitate patterns is very good at continuing them: give it the regular case and it extends the regular case convincingly. But the value in most real work lives in the exceptions — the contract with a clause that does not fit the template, the pipeline that fails in a way the runbook never described, the customer whose situation is genuinely new. Sutton and Rafiee draw the line precisely:

A generative model can continue a pattern, whereas an enactive system can determine what to do next when the pattern breaks.
— Toward Enactive AI (2026)

That is the whole difference between a system that talks about your business and one that runs inside it. A model that has only ever seen the regular case continues the pattern straight off a cliff when the exception arrives. A system that has acted in your environment, kept the state, and learned from being wrong is the one positioned to notice the break and decide what to do about it. The academic word is enactive. The operational word is simpler: it has been here before, and it remembers.

The gap a frozen model cannot close alone

There is one more piece, and most teams meet it the hard way. The same paper makes a blunt observation about today's models: they are not able to evaluate their own outputs without outsourcing evaluation to external signals. A model does not know when it is wrong. It will continue a broken pattern with the same confidence it brings to a correct one. Closing that gap is not a prompt you can write — it is an architecture. The system has to form an expectation, act, compare the real outcome against what it expected, and turn each miss into a standing rule so the same mistake cannot recur. That self-evaluation loop is the thing a raw model lacks, and it is the thing worth building.

Where we stand

We build continually-learning agents for the operational kind of world — your data, your systems, the way your organization actually works. They observe that environment, do real work in it, and compound what they learn into a knowledge base you own and can audit. We are not claiming a finished science: the frontier Sutton points at — fully self-directed agency — is open, and honesty requires saying so. What we have built and run is the practical core of it: experience at the center, state kept across sessions, and a self-evaluation loop that turns each failure into a standing guardrail. The bet underneath all of it is the Big World Hypothesis. Your world is bigger than any model of it. So we stopped shipping the snapshot, and started shipping the thing that keeps learning.

Keep reading.

The two pieces this one sits between: which kind of world model your business actually needs, and where a system’s real evaluation signal comes from.

Two kinds of world models →Your best evals are your failure traces →