Custom Learning Cycles
Why learning cycles need to be designed per use case, and how structuring observation, pattern detection, and decay produces systems that develop genuine expertise.
I want to start with something I keep coming back to when building learning into agentic systems. The way a system learns needs to match the shape of the judgment it is trying to improve. This sounds obvious when you say it out loud, but in practice it is remarkably easy to design a single feedback loop and apply it everywhere, assuming that the mechanism of learning can be generic even when the subject of learning is not.
The reason this matters is that different domains produce fundamentally different kinds of learning signals. A content editing workflow generates clean, visible deltas between a draft and its published version. You can see exactly what the human changed. The signal is structural, contained, and unambiguous. But a system that tracks shifting priorities across a portfolio of relationships is working with something else entirely. The signals are scattered, implicit, and emerge over time rather than appearing in a single correction event. If you try to capture both of these with the same feedback mechanism, you end up with a system that learns well in one domain and produces noise in the other.
I think this is the core problem. Generic learning cycles produce generic improvements, and generic improvements are only useful for narrow, repetitive tasks where the correction signal is clean and the context is stable. The moment the system touches messy, evolving data, a single feedback architecture starts actively degrading performance by treating fundamentally different learning problems as if they were interchangeable.
Different domains produce different learning signals.
I want to make this concrete at the conceptual level, because I think it clarifies why the architecture has to vary.
In a domain where the human directly edits the agent's output, the learning signal is a delta. The system can compare what the agent produced with what the human chose to publish, and the difference between the two is a precise, self-contained correction. The observation is bounded by the artefact.
In a domain where the system is trying to detect whether something has shifted in the world, the learning signal is not a delta at all. It is a collection of weak signals, scattered across different sources, written by different people at different times, none of which is individually conclusive. The observation is not bounded by an artefact. It is bounded by a question the system is trying to answer over time.
These two problems require different observation mechanisms, different thresholds for what counts as a pattern, different tempos for when the system should act on what it has learned, and different decay profiles for when observations stop being relevant. The learning cycle, in other words, has to be designed for the judgment it serves.
The five components of a use-case-specific learning cycle.
Every learning cycle I have built shares the same five structural components, but how each one works depends entirely on the domain.
Observation capture is about what the agent notices, and in what form. For a content system, observations are deltas between versions. For a system tracking relational shifts, observations are signals: a change in tone, a new stakeholder appearing, a shift in the kinds of questions being asked. The form of the observation determines what downstream pattern detection can even look for.
Decision points define where human judgment enters the cycle. In some domains, the human produces the correction directly by editing the output. In others, the human confirms or denies the agent's observation, which is a very different kind of feedback. The placement of the human in the cycle shapes the quality and type of learning signal the system receives.
Pattern detection is the process by which a collection of individual observations becomes something meaningful. I would argue this is where the domain-specificity matters most. In a content domain, three similar edits across different pieces might constitute a pattern. In a domain tracking behavioural shifts, five signals across three months pointing in the same direction might be what it takes. The threshold for "this is a pattern" is not a universal constant. It is a design decision that reflects the nature of the signal.
The upgrade mechanism determines what happens when a pattern is confirmed. I think of this in two speeds. Soft runtime adjustments happen automatically, with the agent weighting its future observations slightly toward what the pattern suggests. These are gentle, carry a confidence score, and decay if not reinforced. Permanent framework changes are the second speed, and they require human approval before the system's reasoning structure is updated. The previous version is always preserved for rollback.
Decay governs what happens to observations and patterns that are not reinforced over time. An observation from six months ago in a fast-moving domain may be worse than no observation at all, because it creates false confidence in something that no longer reflects reality. Decay is not cleanup. It is an active part of the learning architecture, and I will explore it more fully in the companion article on data decay.
Observation is not action. That boundary is structural.
I want to highlight a design principle that I think is easy to overlook. The separation between the observation phase and the action phase needs to be structural, not aspirational. By structural I mean that the observing agent literally cannot write to the frameworks. It writes to an observation store. A separate pattern detection process reads that store on its own cycle.
A single observation is not a pattern. Two observations are not a pattern. The system needs several confirmed instances before even a soft runtime adjustment, and sustained consistency over weeks before proposing a permanent change. This matters because without structural separation, there is always pressure to shortcut the process, to let a single strong signal trigger a change that should have required accumulated evidence.
This separation does three things that I think are worth naming explicitly. It prevents knee-jerk reactions to outliers. It forces the system to accumulate evidence before changing behaviour. And it creates a complete audit trail, so that when the system does propose a change, the human reviewing it can see the full chain of observations that led there.
The learning happens at the point of observation.
I think there is a temptation to think of learning as something that happens at the output layer, where the agent's final product is corrected and the correction is fed back in. But I would argue that the more important learning happens earlier, at the point where the agent decides what is worth noticing in the first place.
An agent encounters data. It forms an observation, not a conclusion, not an action. When observations accumulate and form a pattern, the pattern gets upgraded. It moves from the observation store into the active intelligence that the judgment frameworks can reference. The system is not memorising corrections. It is evolving what it considers worth noticing, which is a fundamentally different kind of learning.
This is the mechanism by which the system stays true to the current landscape. The agent's interpretive lens sharpens over time, not because someone told it what to see, but because the accumulation of observations, filtered through pattern detection and confirmed by human judgment, gradually refines what the system treats as significant.
Different cycles run at different tempos.
A content learning cycle might run on every human edit. A relational intelligence cycle might run on a completely different schedule, with pattern detection monthly, because the underlying signals shift over quarters rather than days.
I think the tempos are as important as the components. A cycle that runs too fast for its domain will overfit to noise. A cycle that runs too slowly will miss genuine shifts. The tempo is not an implementation detail. It is a design decision that reflects how quickly the underlying reality changes in that particular domain.
Each of these is a learning cycle. Each operates on observation, pattern detection, upgrade, and decay. But the thresholds, tempos, and decision points are completely different, because the judgment each one serves is different.
When you get this right, the system mirrors how humans develop expertise.
I keep returning to this analogy because I think it captures something important about what these systems are actually doing. A person who develops deep expertise in a domain does not develop it by being told what to notice. They develop it by noticing things, checking whether their observations hold over time, and gradually building an internal model of what matters and what does not.
That is exactly what use-case-specific learning cycles produce. Accumulated operational judgment, the quiet expertise that compounds over hundreds of interactions and makes the difference between an agent that processes data and one that understands it. The cycle has to be designed for the judgment it serves, because the judgment is what gives the learning its shape.