Context engineering gets the right information into the context window. That's necessary. It's not sufficient. Even when the model has the data, it doesn't know how to read it. It doesn't know what matters, what to ignore, what "empty" means versus what "missing" means. This methodology builds the scaffolding of judgment around the model so its capabilities land in exactly the form the situation demands.
A reasoning framework doesn't contain the answers. It contains the logic for finding them. What factors matter. What trade-offs exist. What signals carry weight. What vocabulary this company uses. The agent loads the framework and reads the data through it. The depth of application is controlled by the situation, not by the framework itself.
When the framework updates, every agent downstream reads through the new lens automatically. That's how the system stays aligned as the business evolves. The framework is the single point of change. Update the framework once, and every agent that loads it reasons differently from that point forward.
When your team reviews an analysis and adjusts a classification, the system captures that correction. After enough similar corrections, it softly adjusts how it classifies. Those adjustments decay if they're not reinforced. The system doesn't over-learn from noise.
After sustained patterns over weeks, the system proposes a permanent change to the reasoning framework. A human reviews the evidence and approves. The framework evolves. The system you deploy in month one is not the system you have in month twelve.
Six principles that prevent drift. Each one addresses a specific way agents lose alignment with how the business thinks.
Rules encode decisions. Frameworks encode reasoning. When context shifts, rules break. Frameworks adapt. That's the difference between brittle automation and genuine intelligence.
When one agent holds multiple perspectives, it finds coherence between them. Coherence isn't analysis. It's rationalisation. One lens per agent. Deep frameworks for that lens. Combine at the synthesis layer, which assembles but never evaluates. The tensions between perspectives are the information.
When a model encounters information it wasn't briefed on, the default is to include it. Pre-classify the distractions and give them somewhere to go that isn't the output. Capture them. Do not analyse them. That's another agent's job.
If the output structure doesn't have a risks section, the model can't hallucinate risks into a factual update. If there's no "Key Gaps" field, gaps can't appear. Design the container, not just the content. The structure constrains what the model can get wrong.
Every system has the same pressure: resolve. Produce a single answer. A situation that supports three valid interpretations has not been understood when one is selected and the other two are discarded. Preserve the tensions. Resolution is the human's job.
The interpretive structures underlying human judgment can be abstracted enough to generalise beyond their origin. The specific knowledge stays behind. The judgment that travels, even imperfectly, is the only kind that scales.
Every agentic system that works in production has four layers: orchestration, frameworks, skills, and metadata. Most people don't separate them cleanly. In practice, each splits because the functions inside them have different rules, different tempos, and different relationships to human control. The nine layers below are what four becomes when you build for real.
The routing layer classifies incoming requests by intent and sets the analytical depth before any agent runs. Ambiguous inputs default to the least analytical path — shallow analysis is correctable, deep analysis on the wrong track wastes context and compounds errors. The router loads only the classification framework, never the domain frameworks. It decides which agents run, not what they conclude.
Orchestration governs agent sequencing, parallelism, and the human boundary. Agents that share an analytical dimension never run in parallel — their outputs would contaminate each other. The human boundary is placed at exactly one point in each flow: where observations become structural changes. The orchestrator manages state but never evaluates content.
A reasoning framework encodes the interpretive logic your best people apply to data. Not the answers — the method. What factors matter, how to weight competing signals, what vocabulary this company uses, and what context changes the meaning. Frameworks are versioned. When a framework updates, every agent downstream reads through the new lens automatically. A typical domain requires a cluster of 8 to 12 frameworks covering different analytical dimensions.
Data reading guides teach agents how to query your specific systems. What each field actually means versus what it appears to mean. Which data sources are authoritative. What "empty" means versus "missing" versus "not yet populated." When a timestamp is reliable and when it reflects the last sync, not the last change. Without reading guides, agents query correctly but interpret incorrectly.
Each skill is a single agent with one analytical lens. The skill definition specifies what the agent should analyse, what it must never analyse, and what it should capture without evaluating. Prohibitions prevent scope creep — they are as important as the positive instructions. A noise-naming strategy gives the agent somewhere to put information that isn't its job. The structure constrains what the model can get wrong.
The assembly layer combines outputs from multiple agents into a single coherent view. It translates formats, resolves schema differences, and maintains provenance. The critical constraint: assembly never evaluates. It does not weigh one agent's output against another. It does not resolve tensions. The tensions between perspectives are the information. Resolution is the human's job.
Every observation, classification, and pattern carries full provenance: which agent produced it, which framework version it read through, what data sources informed it, and when. This isn't logging — it's the mechanism that makes the system auditable and debuggable. When an output is wrong, metadata lets you trace exactly which framework, which data, and which agent produced it.
The learning layer is how the system stays aligned as your business evolves. When your team corrects an analysis or the data reveals a pattern shift, the system captures it. Soft adjustments form immediately and influence future reasoning, but decay if not reinforced — the system won't over-learn from noise. When a pattern sustains across enough observations and enough time, the system proposes a permanent change to the reasoning frameworks. Humans review the evidence and decide. The system you deploy in month one reasons differently by month six — not because it drifted, but because it learned what your business learned.
Every type of intelligence gets a custom expiration strategy. Event-driven data expires on the next event. Velocity-based signals decay over time. Reinforcement-based patterns persist as long as they keep appearing. Without lifecycle management, a competitive signal from eight months ago looks identical to yesterday's signal to the model. The system forgets what's no longer true.
Layers 03 and 04 are the interpretation layer. The only part that changes per domain. Everything else is infrastructure that travels between systems.
Institutional knowledge lives in individual heads and walks out the door when people leave. The encoding process makes it structural. Through analysis of how your people actually read data and make decisions, we build reasoning frameworks that capture what factors matter, how to weight them, when a signal is significant versus noise, and what context changes the meaning.
The result is a framework cluster of 8 to 12 frameworks per domain that agents load and use to interpret data the way your best people would.
The meta-cognitive loop is the system checking whether its framework applies before producing output. Confident outputs from misapplied frameworks are more dangerous than uncertain outputs from the right one. When data falls outside the distribution the framework was calibrated on, the system flags it, tells you what specifically doesn't match, and adjusts its confidence. The expert who is most trustworthy over time is not the one who is most often right. It is the one who knows, most accurately, when they might be wrong.
Read the design philosophy →Misalignment and obsolescence are the default outcome. Every agentic system drifts. The question is whether the architecture contains mechanisms that detect the drift and correct it, or whether it accumulates silently until trust collapses.
Observations become hypotheses become confirmed patterns. Only with human approval. The system proposes. Humans decide. No pattern becomes structural without someone reviewing the evidence. Autonomy is earned through demonstrated accuracy, and downgrades are immediate.
A competitive signal from eight months ago looks identical to yesterday's signal to a model. Without decay architecture, output slowly becomes wrong without anyone noticing. Each type of intelligence gets a custom expiration strategy: event-driven, velocity-based, or reinforcement-based. The system forgets what's no longer true.
Each agent loads exactly the frameworks it needs for its lens, at the right level of abstraction. A reader agent gets a data reading guide and one analytical framework. A pattern agent gets the full taxonomy. This prevents context rot and keeps the system consistent as the framework library grows.
Each article addresses a specific way agents lose alignment in production and the design principle that prevents it. The intellectual foundation behind every system we build.
Read the design philosophy → or start a conversation