The Four Layers of Agentic Architecture
Every agentic system that works in production has four layers: orchestration, frameworks, skills, and metadata. Most people don't separate them cleanly, and that's where the problems start.
Every agentic system that works in production has four layers: orchestration, frameworks, skills, and metadata. Most people building them don't separate the layers cleanly, and that's where the problems start.
But four is the simplified version. The version that fits on a whiteboard. In practice, several of those layers split -- because the functions inside them have different rules, different tempos, and different relationships to human control. What starts as four becomes nine. Understanding the four gets you started. Understanding the nine is what keeps the system alive.
Let me start with the four. Then I'll show you where they crack.
Layer 1: Orchestration
Orchestration is the traffic control. It determines which agents run, in what order, with what dependencies, and what happens when something fails.
Orchestration doesn't make decisions about content, analysis, or output quality. It manages flow. A piece of work enters the system. The orchestrator knows: step one runs first, steps two through four run in parallel, step five waits until all three parallel steps complete, step six takes the combined output. If step three fails, the orchestrator knows whether to retry, proceed degraded, or halt.
The critical principle: the orchestrator never reasons about the work itself. It reasons about the process. It tracks status, manages dependencies, enforces timeouts, counts revision loops, and knows when to stop.
What orchestration handles: sequencing and parallelism. State management -- where is each piece of work right now, which agent touched it last. Failure handling -- timeouts, retries, escalation. Revision management with a hard cap, because models hate stopping and left unchecked a revision loop will run indefinitely. And the human boundary -- placing the human at the right point in the flow and managing both paths of human response.
An orchestrator that starts making content decisions is an orchestrator that's absorbed a skill it shouldn't have. Keep it pure.
Layer 2: Frameworks
Frameworks are the system's judgment. They encode how your best people think about the data, the domain, and the decisions that need to be made.
Frameworks define how to read data correctly: what does an empty field mean, when is a data point stale, how do you handle pagination, which fields are reliable and which are free-text entered by humans with varying discipline. They define how to reason about what you find: what signals matter, how to classify them, what confidence level is required before acting, how to decompose unstructured text into structured intelligence. And they define how deeply to apply analysis: when is scoring appropriate versus just reporting facts, when does absence become a signal versus just an empty field.
Frameworks change slowly. They represent accumulated expertise. When they do change, it should be deliberate -- proposed with evidence, reviewed by a human, approved explicitly.
The relationship between frameworks and skills is the most important architectural boundary in the system. Frameworks define how to think. Skills define what to do. A skill loads only the frameworks it needs and applies them at the depth the situation requires. A reporting skill doesn't need a risk assessment framework. If it loads one, it will find risks. That's not the skill being smart. That's the skill being contaminated by a framework it shouldn't have.
Layer 3: Skills
Skills are the agents themselves. Each one has a defined scope, a defined input, a defined output, and explicit rules about what it does and what it does not do.
The most important word in that sentence is "not." Every skill needs a boundary -- not just a job description, but an explicit prohibition list. Without it, agents drift. They hallucinate scope. A people-finding skill starts commenting on pipeline health. An evidence-selection skill starts reshaping content direction. A validation skill starts rewriting instead of evaluating. These aren't malfunctions. They're what models do when you leave a vacuum -- they fill it with capability nobody asked for.
A well-defined skill has a single responsibility, explicit input/output contracts, a list of what it does NOT do, defined failure handling, and framework loading rules that specify the minimum set of frameworks it needs -- because if it can't access a framework, it can't misuse it.
Layer 4: Metadata
Metadata is the system's memory. Every decision tagged. Every adjustment traced. Every human correction linked back to the output it corrected and the agent that produced it.
Metadata serves three functions: provenance (tracing every output back to its inputs, frameworks, and skill version), learning signal (linking human corrections to the specific context that produced the output), and lifecycle management (tracking when things were created, last reinforced, and last used so stale intelligence can be identified and managed).
Without metadata, you have agents producing output but no record of why. The system works but it's a black box. When something goes wrong, you can't diagnose it. When something goes right, you can't reproduce it.
Where the four become eight
Those four layers are real. They're the right starting point for any agentic architecture. But if you build a production system against them, you'll discover that several contain distinct functions that need to be separated -- because they have different rules, different change tempos, and different failure modes.
Here's where the splits happen.
Split 1: Routing separates from Orchestration
Orchestration manages flow -- what runs when, in what order, with what dependencies. But there's a function that needs to happen before orchestration begins: deciding how the system should approach this specific piece of work.
A routing layer classifies intent. Is this a factual query, a diagnostic analysis, or an investigative search? The answer determines which frameworks get loaded, at what depth, and which sections of the output structure are available. This isn't flow control. This is mode-setting. It determines the character of the work before the work begins.
In the simple four-layer model, routing lives inside orchestration. In practice, it needs to be separate -- because routing decisions are about judgment (how should we think about this?) while orchestration decisions are about logistics (what runs next?). Collapsing them means your traffic controller is also your strategic advisor. Those are different jobs.
Routing also needs different defaults. Orchestration defaults should be conservative about flow -- retry, degrade gracefully, halt if necessary. Routing defaults should be conservative about depth -- when intent is ambiguous, assume the user wants facts, not analysis. Never default to the most analytical mode. These opposing default logics confirm they're separate functions.
Split 2: Frameworks split into Data Reading Guides and Reasoning Frameworks
Inside the frameworks layer, there are two distinct types doing different work. Both are reading guides -- both teach the agent how to interpret what it encounters. But they come from different sources of knowledge, they're written by different people in practice, and they change for different reasons.
Data reading guides teach the agent how to interact with your actual systems. How to query correctly. What to trust. What to cross-reference. What the data architecture looks like underneath.
Never infer what you didn't pull. Always distinguish empty from missing. Never trust a single data layer. Always check data freshness. Never display raw internal identifiers. Handle pagination. Respect filter limits. Scope queries to the right pipeline before interpreting stage labels. These are operational rules. They come from knowing the system -- its quirks, its limitations, its field-level reliability. They change when the data architecture changes: new fields, new APIs, new systems integrated, old fields deprecated.
Without data reading guides, the agent queries confidently, gets results, and misinterprets them. It assumes a single API call returned the full dataset when it only got the first page. It reports on a field without checking when it was last updated. It displays an internal stage ID as if it were a human-readable label. These aren't dramatic hallucinations. They're compounding inaccuracies that make every downstream analysis unreliable.
Reasoning frameworks teach the agent how to think about what it found. How to decompose a messy note into distinct signals. How to classify each signal against a defined taxonomy. How to assess confidence -- high, medium, low -- with explicit criteria for each. How to attribute a signal to the right person and assess that person's role from behaviour, not just title. How to score qualification. How to determine when a pattern is significant versus noise.
A reasoning framework for signal interpretation says: write what the signal means, not what was written. Notes are shorthand from busy people. A single entry might contain four distinct signals -- an urgency deadline, a budget risk, a competitive threat, and an implied stakeholder who hasn't engaged. The framework teaches the agent to decompose that, classify each signal, assess confidence, and interpret what it means in context.
A reasoning framework for qualification says: here are the elements, here's how to score each one, here's what constitutes evidence versus assumption, here's the threshold below which you flag "insufficient data" instead of guessing.
These come from knowing the business -- accumulated expertise about what matters, what signals carry weight, how experienced operators think about data. They change when the business changes: new customer segments, new qualification criteria, new signal patterns emerging from the market.
The split matters because data reading guides and reasoning frameworks change for different reasons and at different rates. Your query pagination rules don't change when your sales methodology evolves. Your signal taxonomy doesn't change when you add a new API endpoint. An agent can load a data reading guide without a reasoning framework (when it just needs to pull accurate data) and a reasoning framework without a data reading guide (when it's interpreting data that's already been pulled). They're both reading guides in the broad sense. They do different jobs.
Split 3: Learning separates from Metadata
In the four-layer model, learning is a function that reads metadata. In practice, learning is a complete subsystem with its own rules, its own tempo, and its own relationship to human control.
Metadata records what happened. Learning detects patterns in what happened and proposes changes based on those patterns. These are fundamentally different activities.
The learning layer has rules that don't apply to metadata: a single observation never triggers a system change. Five confirmed instances produce a soft runtime adjustment -- automatic, weighted, decaying. Ten instances over four weeks produce a proposal for permanent change -- human-gated, explicit, with preserved rollback. Adjustments that aren't reinforced lose confidence over time and eventually archive.
It also has its own tempo. Metadata is written continuously, as work flows through the system. Learning operates on cycles -- per-event observation, weekly pattern analysis, monthly structural review. These cycles have minimum thresholds, confidence requirements, and decay schedules that don't exist in the metadata layer.
And it has a unique relationship to human control. Metadata is a record -- it doesn't change the system's behaviour. Learning changes behaviour, and the distinction between automatic changes (soft adjustments) and human-gated changes (permanent structural updates) is the most important governance boundary in the entire architecture.
When learning is treated as "metadata plus some logic," these governance boundaries get blurry. Separate the layers and the boundaries stay sharp.
Split 4: Lifecycle separates from Metadata
Metadata tracks when things were created and last updated. Lifecycle acts on that information -- decaying what's stale, archiving what's expired, extending what's actively in use.
This is a distinct function that runs on its own schedule, before anything else in the system starts its day. By the time agents begin their work, the intelligence layer has been cleaned. Stale signals have been aged or archived. Expired suggestions have been removed. Learned adjustments that haven't been reinforced have had their confidence reduced. Campaign deadlines have been checked.
Without a dedicated lifecycle layer, stale intelligence accumulates. An agent reads a market signal from three months ago and treats it as current. A learned adjustment from a context that no longer applies quietly influences output. A suggestion that expired two weeks ago sits in the queue looking active. These aren't dramatic failures. They're slow degradation -- the system getting slightly less trustworthy every day without anyone noticing.
Lifecycle management is not glamorous. It's the 6AM janitorial run that nobody sees. But it's the reason the system stays clean enough to trust.
Split 5: Context Assembly separates from Skills
There's a function that sits between the skills that produce structured data and the agent that needs to act on that data -- and it's neither a skill nor an orchestration concern. It's a translation layer.
Multiple agents produce structured outputs: classification results, format recommendations, evidence packages, diversity warnings. These outputs contain system noise -- internal IDs, confidence percentages, selection reasoning, anti-repetition scores. They may contain contradictions -- one agent recommends a hook type that another agent flagged as overused. They reference frameworks and data structures that the downstream consumer shouldn't need to know about.
Context assembly reads all of these structured outputs, strips the system noise, resolves contradictions (with explicit rules about which agent wins in which conflict), checks live data sources for current context, and produces a clean brief that the downstream agent can act on without parsing structured data or making judgment calls about conflicting signals.
This is not a skill. Skills have domain responsibilities -- they classify, they evaluate, they select, they generate. Context assembly has no domain opinion. It's a translator. And it's not orchestration -- the orchestrator manages when this step runs, but the assembly logic is about what reaches the agent's context window and in what form.
When context assembly is collapsed into either skills or orchestration, one of two things happens: either the downstream agent receives raw structured data and has to parse and resolve it itself (introducing a new failure surface), or the orchestrator starts making content-level decisions about what to include and how to resolve conflicts (violating its traffic-control purity).
The full picture
Four layers on the whiteboard. Nine in production:
1. Routing -- classifies intent, sets analytical mode and framework depth
2. Orchestration -- manages flow, sequencing, parallelism, state, the human boundary
3. Data Reading Guides -- operational rules for query accuracy, field trust, freshness, pagination
4. Reasoning Frameworks -- encoded expertise for interpretation, signal taxonomy, confidence, qualification
5. Skills -- the agents, each with defined scope, contracts, and prohibitions
6. Context Assembly -- translates structured outputs into clean briefs, resolves inter-agent conflicts
7. Metadata -- records decisions for provenance, traces every output back to its inputs, frameworks, and skill version
8. Learning -- detects patterns and proposes changes with two-speed governance. Soft adjustments decay. Permanent changes are human-gated
9. Lifecycle -- maintains intelligence freshness through decay, archival, and cleanup
Metadata and learning are deeply coupled -- learning reads metadata, and the metadata schema is designed around what learning needs to detect. But the governance boundary between them is the reason they're separate layers. Metadata records passively. Learning changes behaviour actively. Collapsing them blurs the most important governance boundary in the entire architecture.
Where the boundaries fall depends on the tool
How you implement these nine layers depends entirely on what you're building on.
In a system built on Claude's skill architecture, skills are SKILL.md files. Data reading guides and reasoning frameworks live as reference files that skills load. Orchestration and routing are system skills. Context assembly is a dedicated system skill. Metadata lives in a database. Learning is a dedicated agent with its own cycles. Lifecycle is a cron-triggered system skill.
In a system built on LangChain or CrewAI, the boundaries look different. Skills might be tool definitions or agent configurations. Frameworks might be system prompts or retrieval sources. Orchestration might be a chain or graph definition. Context assembly might be a pre-processing step in the chain. Metadata might be a state object. Learning might be an external service.
In a system built on a custom runtime with workflow engines, orchestration is the workflow engine itself. Skills are API calls or function invocations. Frameworks are loaded from a knowledge store. Context assembly is a dedicated step in the workflow. Metadata is events in a log or entries in a database.
The layers are the same. The implementation varies. What matters is that they're separated -- because a skill that contains orchestration logic can't be recomposed into a different flow, a framework embedded inside a skill can't be shared or updated independently, metadata mixed into output can't be queried for learning signals, and context assembly buried inside generation creates a failure surface nobody can debug.
Why separation matters
Systems that work don't stay static. The business changes. The data changes. What "good" looks like changes.
When the layers are separated, evolution is surgical. Change how work flows? Update the orchestrator. Skills stay the same. Refine how agents query your CRM? Update the data reading guide. Refine how they interpret what they find? Update the reasoning framework. Add a new capability? Add a new skill. The orchestrator adds it to the flow. Everything else stays the same.
When the layers are collapsed, every change is a rewrite. And every rewrite risks breaking something that was working because the thing you wanted to change was entangled with something you didn't.
Four layers get you started. Nine keep you honest. Separate them, whatever the tool. The architectural clarity is worth more than the convenience of keeping everything in one place.