The methodology

How agents
learn to reason.

Context engineering gets the right information into the context window. That's necessary. It's not sufficient. Even when the model has the data, it doesn't know how to read it. It doesn't know what matters, what to ignore, what "empty" means versus what "missing" means. This methodology builds the scaffolding of judgment around the model so its capabilities land in exactly the form the situation demands.

The core thesis

A framework
is a reading guide,
not reference data.

A reasoning framework doesn't contain the answers. It contains the logic for finding them. What factors matter. What trade-offs exist. What signals carry weight. What vocabulary this company uses. The agent loads the framework and reads the data through it. The depth of application is controlled by the situation, not by the framework itself.

When the framework updates, every agent downstream reads through the new lens automatically. That's how the system stays aligned as the business evolves. The framework is the single point of change. Update the framework once, and every agent that loads it reasons differently from that point forward.

The learning cycle

When your team reviews an analysis and adjusts a classification, the system captures that correction. After enough similar corrections, it softly adjusts how it classifies. Those adjustments decay if they're not reinforced. The system doesn't over-learn from noise.

After sustained patterns over weeks, the system proposes a permanent change to the reasoning framework. A human reviews the evidence and approves. The framework evolves. The system you deploy in month one is not the system you have in month twelve.

Design principles

What keeps
a system aligned.

Six principles that prevent drift. Each one addresses a specific way agents lose alignment with how the business thinks.

Frameworks, not rules

Rules encode decisions. Frameworks encode reasoning. When context shifts, rules break. Frameworks adapt. That's the difference between brittle automation and genuine intelligence.

Scope by lens, not by task

When one agent holds multiple perspectives, it finds coherence between them. Coherence isn't analysis. It's rationalisation. One lens per agent. Deep frameworks for that lens. Combine at the synthesis layer, which assembles but never evaluates. The tensions between perspectives are the information.

Name the noise

When a model encounters information it wasn't briefed on, the default is to include it. Pre-classify the distractions and give them somewhere to go that isn't the output. Capture them. Do not analyse them. That's another agent's job.

Structure is containment

If the output structure doesn't have a risks section, the model can't hallucinate risks into a factual update. If there's no "Key Gaps" field, gaps can't appear. Design the container, not just the content. The structure constrains what the model can get wrong.

Ambiguity is information

Every system has the same pressure: resolve. Produce a single answer. A situation that supports three valid interpretations has not been understood when one is selected and the other two are discarded. Preserve the tensions. Resolution is the human's job.

Intelligence that travels

The interpretive structures underlying human judgment can be abstracted enough to generalise beyond their origin. The specific knowledge stays behind. The judgment that travels, even imperfectly, is the only kind that scales.

The architecture

Four on the
whiteboard.
Nine in production.

Every agentic system that works in production has four layers: orchestration, frameworks, skills, and metadata. Most people don't separate them cleanly. In practice, each splits because the functions inside them have different rules, different tempos, and different relationships to human control. The nine layers below are what four becomes when you build for real.

Routing+Classifies intent, sets analytical depth. When ambiguous, defaults to least analytical.

The routing layer classifies incoming requests by intent and sets the analytical depth before any agent runs. Ambiguous inputs default to the least analytical path — shallow analysis is correctable, deep analysis on the wrong track wastes context and compounds errors. The router loads only the classification framework, never the domain frameworks. It decides which agents run, not what they conclude.

Orchestration+Manages flow, sequencing, parallelism, the human boundary.

Orchestration governs agent sequencing, parallelism, and the human boundary. Agents that share an analytical dimension never run in parallel — their outputs would contaminate each other. The human boundary is placed at exactly one point in each flow: where observations become structural changes. The orchestrator manages state but never evaluates content.

Reasoning frameworks+Encoded expertise. How your best people read this domain.

A reasoning framework encodes the interpretive logic your best people apply to data. Not the answers — the method. What factors matter, how to weight competing signals, what vocabulary this company uses, and what context changes the meaning. Frameworks are versioned. When a framework updates, every agent downstream reads through the new lens automatically. A typical domain requires a cluster of 8 to 12 frameworks covering different analytical dimensions.

Data reading guides+How to query your systems. What to trust. What "stale" means.

Data reading guides teach agents how to query your specific systems. What each field actually means versus what it appears to mean. Which data sources are authoritative. What "empty" means versus "missing" versus "not yet populated." When a timestamp is reliable and when it reflects the last sync, not the last change. Without reading guides, agents query correctly but interpret incorrectly.

Skills+Individual agents. One lens each. Bounded scope, explicit prohibitions.

Each skill is a single agent with one analytical lens. The skill definition specifies what the agent should analyse, what it must never analyse, and what it should capture without evaluating. Prohibitions prevent scope creep — they are as important as the positive instructions. A noise-naming strategy gives the agent somewhere to put information that isn't its job. The structure constrains what the model can get wrong.

Context assembly+Translates between agents. Assembles, never evaluates.

The assembly layer combines outputs from multiple agents into a single coherent view. It translates formats, resolves schema differences, and maintains provenance. The critical constraint: assembly never evaluates. It does not weigh one agent's output against another. It does not resolve tensions. The tensions between perspectives are the information. Resolution is the human's job.

Metadata+Provenance, traceability, audit. Every write is traced.

Every observation, classification, and pattern carries full provenance: which agent produced it, which framework version it read through, what data sources informed it, and when. This isn't logging — it's the mechanism that makes the system auditable and debuggable. When an output is wrong, metadata lets you trace exactly which framework, which data, and which agent produced it.

Learning+Soft adjustments decay. Permanent changes are human-gated.

The learning layer is how the system stays aligned as your business evolves. When your team corrects an analysis or the data reveals a pattern shift, the system captures it. Soft adjustments form immediately and influence future reasoning, but decay if not reinforced — the system won't over-learn from noise. When a pattern sustains across enough observations and enough time, the system proposes a permanent change to the reasoning frameworks. Humans review the evidence and decide. The system you deploy in month one reasons differently by month six — not because it drifted, but because it learned what your business learned.

Lifecycle+Decay, archival, cleanup. Stale data never silently informs live decisions.

Every type of intelligence gets a custom expiration strategy. Event-driven data expires on the next event. Velocity-based signals decay over time. Reinforcement-based patterns persist as long as they keep appearing. Without lifecycle management, a competitive signal from eight months ago looks identical to yesterday's signal to the model. The system forgets what's no longer true.

The interpretation layer

Layers 03 and 04 are the interpretation layer. The only part that changes per domain. Everything else is infrastructure that travels between systems.

Knowledge encoding

How expertise
becomes framework.

Institutional knowledge lives in individual heads and walks out the door when people leave. The encoding process makes it structural. Through analysis of how your people actually read data and make decisions, we build reasoning frameworks that capture what factors matter, how to weight them, when a signal is significant versus noise, and what context changes the meaning.

The result is a framework cluster of 8 to 12 frameworks per domain that agents load and use to interpret data the way your best people would.

What gets encoded
The combination of signals that together predict an outcome. Not the individual data points. The pattern of signals, the weighting between them, and the context that changes their meaning. That's what your best people use. That's what gets captured.
Two types of knowledge
Specific knowledge can't be encoded. Interpretive structure can. The test: remove the proper nouns. If the principle is still true, it travels. If it depends on knowing a specific person or a specific deal, it stays behind.
Framework templates
Domain-calibrated starter frameworks from patterns across deployments. 60 to 70% of the way there, with markers showing exactly where your company's specific judgment goes. Your answers fill the gaps. The agents read through the result.
Meta-cognition

Agents that
question their
own thinking.

The meta-cognitive loop is the system checking whether its framework applies before producing output. Confident outputs from misapplied frameworks are more dangerous than uncertain outputs from the right one. When data falls outside the distribution the framework was calibrated on, the system flags it, tells you what specifically doesn't match, and adjusts its confidence. The expert who is most trustworthy over time is not the one who is most often right. It is the one who knows, most accurately, when they might be wrong.

Read the design philosophy →
Alignment infrastructure

What keeps
it calibrated.

Misalignment and obsolescence are the default outcome. Every agentic system drifts. The question is whether the architecture contains mechanisms that detect the drift and correct it, or whether it accumulates silently until trust collapses.

The human confirmation gate

Observations become hypotheses become confirmed patterns. Only with human approval. The system proposes. Humans decide. No pattern becomes structural without someone reviewing the evidence. Autonomy is earned through demonstrated accuracy, and downgrades are immediate.

Data decay as architecture

A competitive signal from eight months ago looks identical to yesterday's signal to a model. Without decay architecture, output slowly becomes wrong without anyone noticing. Each type of intelligence gets a custom expiration strategy: event-driven, velocity-based, or reinforcement-based. The system forgets what's no longer true.

Loading discipline

Each agent loads exactly the frameworks it needs for its lens, at the right level of abstraction. A reader agent gets a data reading guide and one analytical framework. A pattern agent gets the full taxonomy. This prevents context rot and keeps the system consistent as the framework library grows.

Go deeper

The design
philosophy.

Each article addresses a specific way agents lose alignment in production and the design principle that prevents it. The intellectual foundation behind every system we build.

Read the design philosophy → or start a conversation