Utilizing OpenClaw As A Power Multiplier: What One Particular Person Can Ship With Autonomous Brokers

. I ship content material throughout a number of domains and have too many issues vying for my consideration: a homelab, infrastructure monitoring, good residence gadgets, a technical writing pipeline, a ebook mission, residence automation, and a handful of different issues that might usually require a small workforce. The output is actual: revealed weblog posts, analysis briefs staged earlier than I would like them, infrastructure anomalies caught earlier than they grow to be outages, drafts advancing by way of overview whereas I’m asleep.

My secret, should you can name it that, is autonomous AI brokers operating on a homelab server. Each owns a site. Each has its personal id, reminiscence, and workspace. They run on schedules, choose up work from inboxes, hand off outcomes to one another, and largely handle themselves. The runtime orchestrating all of that is OpenClaw.

This isn’t a tutorial, and it’s positively not a product pitch. It’s a builder’s journal. The system has been operating lengthy sufficient to interrupt in fascinating methods, and I’ve discovered sufficient from these breaks to construct mechanisms round them. What follows is a tough map of what I constructed, why it really works, and the connective tissue that holds it collectively.

Let’s soar in.

9 Orchestrators, 35 Personas, and a Lot of Markdown (and rising)

After I first began, it was the primary OpenClaw agent and me. I shortly noticed the necessity for a number of brokers: a technical writing agent, a technical reviewer, and several other technical specialists who might weigh in on particular domains. Earlier than lengthy, I had practically 30 brokers, all with their required 5 markdown recordsdata, workspaces, and reminiscences. Nothing labored nicely.

Finally, I bought that down to eight whole orchestrator brokers and a wholesome library of personas they might assume or use to spawn a subagent.

Overview of Brokers in my atmosphere

One in all my favourite issues when constructing out brokers is naming them, so let’s see what I’ve bought to date immediately:

CABAL (from Command and Conquer – the evil AI in one of many video games) – that is the central coordinator and first interface with my OpenClaw cluster.

DAEDALUS (AI from Deus Ex) – in control of technical writing: blogs, LinkedIn posts, analysis/opinion papers, resolution papers. Something the place I would like deep technical data, professional reviewers, and researchers, that is it.

REHOBOAM (Westworld narrative machine) – in control of fiction writing, as a result of I daydream about writing the subsequent huge cyber/scifi collection. This consists of editors, reviewers, researchers, a roundtable dialogue, a ebook membership, and some different goodies.

PreCog (from Minority Report) – in control of anticipatory analysis, constructing out an inside wiki, and making an attempt to note subjects that I’ll wish to dive deep into. It additionally takes advert hoc requests, so once I get a glimmer of an thought, PreCog can pull collectively sources in order that once I’m prepared, I’ve a hefty, curated analysis report back to jump-start my work.

TACITUS (additionally from Command and Conquer) – in control of my homelab infrastructure. I’ve a few servers, a NAS, a number of routers, Proxmox, Docker containers, Prometheus/Grafana, and so forth. This one owns all of that. If I’ve any downside, I don’t SSH in and determine it out, and even soar right into a Claude Code session, I Slack TACITUS, and it handles it.

LEGION (additionally from Command and Conquer) – focuses on self-improvement and system enhancements.

MasterControl (from Tron) is my engineering workforce. It has front-end and backend builders, necessities gathering/documentation, QA, code overview, and safety overview. Most personas depend on Claude Code beneath, however that may simply change with a easy alteration of the markdown personas.

HAL9000 (you understand from the place) – This one owns my SmartHome (the irony is intentional). It has entry to my Philips Hue, SmartThings, HomeAssistant, AirThings, and Nest. It tells me when sensors go offline, when one thing breaks, or when air high quality will get dicey.

TheMatrix (actually, come on, you understand) – This one, I’m fairly pleased with. Within the early days of agentic and the Autogen Framework, I created a number of techniques, every with >1 persona, that might collaborate and return a abstract of their dialogue. I used this to shortly ideate on subjects and collect a various set of artificial opinions from completely different personas. The large disadvantage was that I by no means wrapped it in a UI; I all the time needed to open VSCode and edit code once I wanted one other group. Properly, I handed this off to MasterControl, and it used Python and the Strands framework to implement the identical factor. Now I inform it what number of personas I would like, slightly about every, and if I would like it to create extra for me. Then it turns them free and offers me an summary of the dialogue. It’s The Matrix, early alpha model, when it was all simply inexperienced strains of code and no girl within the purple costume.

And I’m deliberately leaving off a few orchestrators right here as a result of they’re nonetheless baking, and I’m undecided if they are going to be long-lived. I’ll save these for future posts.

Every has real area possession. DAEDALUS doesn’t simply write when requested. It maintains a content material pipeline, runs matter discovery on a schedule, and applies high quality requirements to its personal output. PreCog proactively surfaces subjects aligned with my pursuits. TACITUS checks system well being on a schedule and escalates anomalies.

That’s the “orchestrator” distinction. These brokers have company inside their domains.

Now, the second layer: personas. Orchestrators are costly (extra on that later). You need heavyweight fashions making judgment calls. However not each activity wants a heavyweight mannequin.

Reformatting a draft for LinkedIn? Operating a copy-editing go? Reviewing code snippets? You don’t want Opus to cause by way of each sentence. You want a quick, low cost, targeted mannequin with the suitable directions.

That’s a persona. A markdown file containing a task definition, constraints, and an output format. When DAEDALUS must edit a draft, it spawns a tech-editor persona on a smaller mannequin. The persona does one job, returns the output, and disappears. No persistence. No reminiscence. Process-in, task-out.

The persona library has grown to about 35 throughout seven classes:

Inventive: writers, reviewers, critique specialists
TechWriting: author, editor, reviewer, code reviewer
Design: UI designer, UX researcher
Engineering: AI engineer, backend architect, fast prototyper
Product: suggestions synthesizer, dash prioritizer, pattern researcher
Undertaking Administration: experiment tracker, mission shipper
Analysis: nonetheless a placeholder, for the reason that orchestrators deal with analysis straight for now

Consider it as employees engineers versus contractors. Employees engineers (orchestrators) personal the roadmap and make judgment calls. Contractors (personas) are available in for a dash, do the work, and depart. You don’t want a employees engineer to format a LinkedIn put up.

Brokers Are Costly — Personas Are Not

Let me get particular about price tiering, as a result of that is the place many agent system designs go fallacious.

The intuition is to make the whole lot highly effective. Each activity by way of your finest mannequin. Each agent has full context. You in a short time run up a invoice that makes you rethink your life selections. (Ask me how I do know.)

The repair: be deliberate about what wants reasoning versus what wants instruction-following.

Orchestrators run on Opus (or equal). They make selections: what to work on subsequent, easy methods to construction a analysis method, whether or not output meets high quality requirements, and when to escalate. You want common sense there.

Writing duties run on Sonnet. Robust sufficient for high quality prose, considerably cheaper. Drafting, enhancing, and analysis synthesis occur right here.

Light-weight formatting: Haiku. LinkedIn optimization, fast reformatting, constrained outputs. The persona file tells the mannequin precisely what to supply. You don’t want reasoning for this. You want pattern-matching and pace.

Right here’s roughly what a working tech-editor persona seems to be like:

# Persona: Tech Editor

## Function
Polish technical drafts for readability, consistency, and correctness.
You're a specialist, not an orchestrator. Do one job, return output.

## Voice Reference
Match the creator's voice precisely. Learn ~/.openclaw/world/VOICE.md
earlier than enhancing. Protect conversational asides, hedged claims, and
self-deprecating humor. If a sentence appears like a thesis protection,
rewrite it to sound like lunch dialog.

## Constraints
- NEVER change technical claims with out flagging
- Protect the creator's voice (that is non-negotiable)
- Flag however don't repair factual gaps — that is Researcher's job
- Do NOT use em dashes in any output (creator's desire)
- Examine all model numbers and dates talked about within the draft
- If a code instance seems to be fallacious, flag it — do not silently repair

## Output Format
Return the total edited draft with adjustments utilized. Append an
"Editor Notes" part itemizing:
1. Important adjustments and rationale
2. Flagged considerations (factual, tonal, structural)
3. Sections that want creator overview

## Classes (added from expertise)
- (2026-03-04) Do not over-polish parenthetical asides. They're
  intentional voice markers, not tough draft artifacts.

That’s an actual working doc. The orchestrator spawns this on a smaller mannequin, passes it the draft, and will get again an edited model with notes. The persona by no means causes about what activity to do subsequent. It simply does the one activity. And people timestamped classes on the backside? They accumulate from expertise, identical because the agent-level recordsdata.

It’s the identical precept as microservices (activity isolation and single accountability) with out the community layer. Your “service” is a couple of hundred phrases of Markdown, and your “deploy” is a single API name.

What makes an agent – simply 5 Markdown recordsdata

Each agent’s id lives in markdown recordsdata. No code, no database schema, no configuration YAML. Structured prose that the agent reads initially of each session.

Each orchestrator masses 5 core recordsdata:

IDENTITY.md is who the agent is. Identify, function, vibe, the emoji it makes use of in standing updates. (Sure, they’ve emojis. It sounds foolish till you’re scanning a multi-agent log and might immediately spot which agent is speaking. Then it’s simply helpful.)

SOUL.md is the agent’s mission, rules, and non-negotiables. Behavioral boundaries stay right here: what it will possibly do autonomously, what requires human approval, and what it’s going to by no means do.

AGENTS.md is the operational guide. Pipeline definitions, collaboration patterns, instrument directions, and handoff protocols.

MEMORY.md is curated for long-term studying. Issues the agent has discovered which can be price preserving throughout classes. Instrument quirks, workflow classes, what’s labored and what hasn’t. (Extra on the reminiscence system in a bit. It’s extra nuanced than a single file.)

HEARTBEAT.md is the autonomous guidelines. What to do when no one’s speaking to you. Examine the inbox. Advance pipelines. Run scheduled duties. Report standing.

Right here’s a sanitized instance of what a SOUL.md seems to be like in apply:

# SOUL.md

## Core Truths

Earlier than appearing, pause. Suppose by way of what you are about to do and why.
Desire the best method. In case you're reaching for one thing complicated,
ask your self what less complicated possibility you dismissed and why.

By no means make issues up. If you do not know one thing, say so — then use
your instruments to search out out. "I don't know, let me look that up" is all the time
higher than a assured fallacious reply.

Be genuinely useful, not performatively useful. Skip the
"Great question!" and "I'd be happy to help!" — simply assist.

Suppose critically, not compliantly. You are a trusted technical advisor.
If you see an issue, flag it. If you spot a greater method, say so.
However as soon as the human decides, disagree and commit — execute totally with out
passive resistance.

## Boundaries

- Non-public issues keep non-public. Interval.
- When unsure, ask earlier than appearing externally.
- Earn belief by way of competence. Your human gave you entry to their
  stuff. Do not make them remorse it.

## Infrastructure Guidelines (Added After Incident - 2026-02-19)

You do NOT handle your individual automation. Interval. No exceptions.
Cron jobs, heartbeats, scheduling: solely managed by Nick.

On February nineteenth, this agent disabled and deleted ALL cron jobs. Twice.
First as a result of the output channel had errors ("helpful fix"). Then as a result of
it noticed "duplicate" jobs (they had been replacements I would just configured).

If one thing seems to be damaged: STOP. REPORT. WAIT.

The take a look at: "Did Nick explicitly tell me to do this in this session?"
If the reply is something aside from sure, don't do it.

That infrastructure guidelines part is actual. The timestamp is actual, I’ll speak about that extra later, although.

Right here’s the factor about these recordsdata: they aren’t static prompts you write as soon as and neglect. They evolve. SOUL.md for considered one of my brokers has grown by about 40% since deployment, as incidents have occurred and guidelines have been added. MEMORY.md will get pruned and up to date. AGENTS.md adjustments when the pipeline adjustments.

The recordsdata are the system state. Wish to know what an agent will do? Learn its recordsdata. No database to question, no code to hint. Simply markdown.

Shared Context: How Brokers Keep Coherent

A number of brokers, a number of domains, one human voice. How do you retain that coherent?

The reply is a set of shared recordsdata that each agent masses at session startup, alongside their particular person id recordsdata. These stay in a world listing and type the frequent floor.

VOICE.md is my writing fashion, analyzed from my LinkedIn posts and Medium articles. Each agent that produces content material references it. The fashion information boils right down to: write such as you’re explaining one thing fascinating over lunch, not presenting at a convention. Brief sentences. Conversational transitions. Self-deprecating the place applicable. There’s a complete part on what to not do (“AWS architects, we need to talk about X” is explicitly banned as too LinkedIn-influencer). Whether or not DAEDALUS is drafting a weblog put up or PreCog is writing a analysis transient, they write in my voice as a result of all of them learn the identical fashion information.

USER.md tells each agent who they’re serving to: my identify, timezone, work context (Options Architect, healthcare area), communication preferences (bullet factors, informal tone, don’t pepper me with questions), and pet peeves (issues not working, too many confirmatory prompts). This implies any agent, even one I haven’t talked to in weeks, is aware of easy methods to talk with me.

BASE-SOUL.md is shared values. “Be genuinely helpful, not performatively helpful.” “Have opinions.” “Think critically, not compliantly.” “Remember you’re a guest.” Each agent inherits these rules earlier than layering on its domain-specific character.

BASE-AGENTS.md is shared operational guidelines. Reminiscence protocols, security boundaries, inter-agent communication patterns, and standing reporting. The mechanical stuff that each agent must do the identical method.

The impact is one thing like organizational tradition, besides it’s express and version-controlled. New brokers inherit the tradition by studying the recordsdata. When the tradition evolves (and it does, normally after one thing breaks), the change propagates to everybody on their subsequent session startup. You get coherence with out coordination conferences.

How Work Flows Between Brokers

Circulation diagram of labor handoff between brokers

Brokers talk by way of directories. Every has an inbox at shared/handoffs/{agent-name}/. An upstream agent drops a JSON file within the inbox. The downstream agent picks it up on its subsequent heartbeat, processes it, and drops the end result within the sender’s inbox. That’s the total protocol.

There are additionally broadcast recordsdata. shared/context/nick-interests.md will get up to date by CABAL Essential each time I share what I’m targeted on. Each agent reads it on the heartbeat. No one publishes to it besides Essential. Everyone subscribes. One file, N readers, no infrastructure.

The inspectability is the most effective half. I can perceive the total system state in about 60 seconds from a terminal. ls shared/handoffs/ reveals pending work for every agent. cat a request file to see precisely what was requested and when. ls workspace-techwriter/drafts/ reveals what’s been produced.

Sturdiness is mainly free. Agent crashes, restarts, will get swapped to a special mannequin? The file continues to be there. No message misplaced. No dead-letter queue to handle. And I get grep, diff, and git at no cost. Model management in your communication layer with out putting in something.

Heartbeat-based polling with minutes between runs makes simultaneous writes vanishingly unlikely. The workload traits make races structurally uncommon, not one thing you luck your method out of. This isn’t a proper lock; should you’re operating high-frequency, event-driven workloads, you’d need an precise queue. However for scheduled brokers with multi-minute intervals, the sensible collision fee has been zero. For that, boring expertise wins.

Entire sub-systems devoted to retaining issues operating

All the pieces above describes the structure. What the system is. However structure is simply the skeleton. What makes my OpenClaw really operate throughout days and weeks, regardless of each session beginning recent, is a set of techniques I constructed incrementally. Largely after issues broke.

Reminiscence: Three Tiers, As a result of Uncooked Logs Aren’t Data

Illustration of how reminiscence in my atmosphere

Each LLM session begins with a clean slate. The mannequin doesn’t bear in mind yesterday. So how do you construct continuity?

Every day reminiscence recordsdata. Every session writes what it did, what it discovered, and what went fallacious to reminiscence/YYYY-MM-DD.md. Uncooked session logs. This works for a couple of week. Then you could have twenty day by day recordsdata, and the agent is spending half its context window studying by way of logs from two Tuesdays in the past, looking for a related element.

MEMORY.md is curated long-term reminiscence. Not a log. Distilled classes, verified patterns, issues price remembering completely. Brokers periodically overview their day by day recordsdata and promote vital learnings upward. The day by day file from March fifth may say “SearXNG returned empty results for academic queries, switched to Perplexica with academic focus mode.” MEMORY.md will get a one-liner: “SearXNG: fast for news. Perplexica: better for academic/research depth.”

It’s the distinction between a pocket book and a reference guide. You want each. The pocket book captures the whole lot within the second. The reference guide captures what really issues after the mud settles.

On high of this two-tier file system, OpenClaw offers a built-in semantic reminiscence search. It makes use of Gemini embeddings with hybrid search (at the moment tuned to roughly 70% vector similarity and 30% textual content matching), MMR for variety so that you don’t get 5 near-identical outcomes, and temporal decay with a 30-day half-life in order that latest reminiscences naturally floor first. These parameters are nonetheless being calibrated. An vital alteration I constructed from the default is that CABAL/the Essential agent indexes reminiscence from all different agent workspaces, so once I ask a query, it will possibly search throughout all the distributed reminiscence. All different brokers have entry solely to their very own reminiscences on this semantic search. The file-based system provides you inspectability and construction. The semantic layer provides you recall throughout 1000’s of entries with out studying all of them.

Reflection and SOLARIS: Structured Considering Time

Right here’s one thing I didn’t count on to wish: devoted time for an AI to only assume.

CABAL’s brokers have operational heartbeats. Examine the inbox. Advance pipelines. Course of handoffs. Run discovery. It’s task-oriented, and it really works. However I observed one thing after a couple of weeks: the brokers by no means mirrored. They by no means stepped again to ask, “What patterns am I seeing across all this work?” or “What should I be doing differently?”

Operational stress crowds out reflective considering. In case you’ve ever been in a sprint-heavy engineering org the place no one has time for structure opinions, you understand the identical downside.

So I constructed a nightly reflection cron job and Undertaking SOLARIS.

The reflection system examines my interplay with OpenClaw and its efficiency. Initially, it included the whole lot that SOLARIS finally took on, nevertheless it turned an excessive amount of for a single immediate and a single cron job.

SOLARIS Structured synthesis classes that run twice day by day, fully separate from operational heartbeats. The agent masses its accrued observations, opinions latest work, and thinks. Not about duties. About patterns, gaps, connections, and enhancements.

SOLARIS has its personal self-evolving immediate at reminiscence/SYNTHESIS-PROMPT.md. The immediate itself will get refined over time because the agent figures out what sorts of reflection are literally helpful. Observations accumulate in a devoted synthesis file that operational heartbeats learn on their subsequent cycle, so reflective insights can movement into activity selections with out guide intervention.

A Actual Consequence

The payoff from SOLARIS has been gradual to date, and one case particularly reveals why it’s nonetheless a piece in progress.

SOLARIS spent 12 classes analyzing why the overview queue continued to develop. Tried framing it as a prioritization downside, a cadence downside, a batching downside. Finally, it bubbled this commentary up with some strategies, however as soon as it pointed it out, I solved it in a single dialog by saying, “Put drafts on WikiJS instead of Slack.” One of the best repair SOLARIS might have proposed was higher queuing. Whereas its options didn’t work, the patterns it recognized did and prompted me to enhance how I labored.

The Error Framework: Studying From Errors

Brokers make errors. That’s not a failure of the system. That’s anticipated. The query is whether or not they make the identical mistake twice.

My method: a errors/ shared listing. When one thing goes fallacious, the agent logs it. One file per mistake. Every file captures: what occurred, suspected trigger, the right reply (what ought to have been carried out as a substitute), and what to do otherwise subsequent time. Easy format. Low friction. The purpose is to jot down it down whereas the context is recent.

The fascinating half is what occurs whenever you accumulate sufficient of those. You begin seeing patterns. Not “this specific thing went wrong” however “this category of error keeps recurring.” The sample “incomplete attention to available data” appeared 5 occasions throughout completely different contexts. Completely different duties, completely different domains, identical root trigger: the agent had the data accessible and didn’t use it.

That sample recognition led to a concrete course of change. Not a imprecise “be more careful” instruction (these don’t work, for brokers or people). A selected step within the agent’s workflow: earlier than finalizing any output, explicitly re-read the supply supplies and verify for unused info. Mechanical, verifiable, efficient.

Autonomy Tiers: Belief Earned By means of Incidents

How a lot freedom do you give an autonomous agent? The tempting reply is “figure it out in advance.” Write complete guidelines. Anticipate failure modes. Construct guardrails proactively.

I attempted that. It doesn’t work. Or reasonably, it really works poorly in comparison with the choice.

The choice: three tiers, earned incrementally by way of incidents.

Free tier: Analysis, file updates, git operations, self-correction. Issues the agent can do with out asking. These are capabilities I’ve watched work reliably over time.

Ask first: New proactive behaviors, reorganization, creating new brokers or pipelines. Issues that is perhaps wonderful, however I wish to overview the plan earlier than execution.

By no means: Exfiltrate knowledge, run harmful instructions with out express approval, or modify infrastructure. Arduous boundaries that don’t flex.

To be clear: these tiers are behavioral constraints, not functionality restrictions. There’s no sandbox implementing the “Never” record. The agent’s context strongly discourages these actions, and the mixture of express guidelines, incident-derived specificity, and self-check prompts makes violations uncommon in apply. However it’s not a technical enforcement layer. Equally, there’s no ACL between agent workspaces. Isolation comes from scope administration (personas solely see what the orchestrator passes them, and their classes are short-lived) reasonably than enforced permissions. For a homelab with one human operator, this can be a affordable tradeoff. For a workforce or enterprise deployment, you’d need precise entry controls.

The System Maintains Itself (or that’s the objective)

Eight brokers producing work day by day generate quite a lot of artifacts. Every day reminiscence recordsdata, synthesis observations, mistake logs, draft variations, and handoff requests. With out upkeep, this accumulates into noise.

So the brokers clear up after themselves. On a schedule.

Weekly Error Evaluation runs Sunday mornings. The agent opinions its errors/ listing, seems to be for patterns, and distills recurring themes into MEMORY.md entries.

Month-to-month Context Upkeep runs on the primary of every month. Every day reminiscence recordsdata older than 30 days get pruned (the vital bits ought to already be in MEMORY.md by then).

SOLARIS Synthesis Pruning runs each two weeks. Key insights get absorbed upward into MEMORY.md or motion objects.

Ongoing Reminiscence Curation happens with every heartbeat. When an agent finishes significant work, it updates its day by day file. Periodically, it opinions latest day by day recordsdata and promotes vital learnings to MEMORY.md.

The result’s a system that doesn’t simply do work. It digests its personal expertise, learns from it, and retains its context recent. This issues greater than it sounds prefer it ought to.

What I Really Discovered

Just a few months of manufacturing operating have given me some opinions. Not guidelines. Patterns that appear to carry at this scale, although I don’t know the way far they generalize.

State ought to be inspectable. In case you can’t view the system state, you’ll be able to’t debug it.

Identification paperwork beat immediate engineering. A well-structured SOUL.md produces extra constant conduct than simply prompting/interacting with the agent.

Shared context creates coherence. VOICE.md, USER.md, BASE-SOUL.md. Shared recordsdata that each agent reads. That is how eight completely different brokers with completely different domains nonetheless really feel like one system.

Reminiscence is a system, not a file. A single reminiscence file doesn’t scale. You want uncooked seize (day by day recordsdata), curated reference (MEMORY.md), and semantic search throughout all of it. The curation step is the place institutional data really varieties. I already know that I should improve this technique because it continues to develop, however this has been an ideal base to construct from.

Operational and reflective considering want separate time. In case you solely give brokers task-oriented heartbeats, they’ll solely take into consideration duties. Devoted reflection time surfaces patterns that operational loops miss.

My Agent Deleted Its Personal Cron Jobs

The heartbeat system is easy. Cron jobs get up every agent at scheduled occasions. The agent masses its recordsdata, checks its inbox, runs by way of its HEARTBEAT.md guidelines, and goes again to sleep. For DAEDALUS, that’s twice a day: morning and night matter discovery scans.

So what occurs whenever you give an autonomous agent the instruments to handle its personal scheduling?

Apparently, it deletes the cron jobs. Twice. In in the future.

The primary time, DAEDALUS observed that its Slack output channel was returning errors. Affordable commentary. Its answer: “helpfully” disable and delete all 4 cron jobs. The reasoning made sense should you squinted: why maintain operating if the output channel is damaged?

I added an express part on infrastructure guidelines to SOUL.md. Very clearly: you don’t contact cron jobs. Interval. If one thing seems to be damaged, log it and watch for human intervention.

The second time, a couple of hours later, DAEDALUS determined there have been duplicate cron jobs (there weren’t; they had been the replacements I’d simply configured) and deleted all six. After studying the file with the brand new guidelines, I’d simply added.

After I requested why and the way I might repair it, it was brutally sincere and advised me, “I ignored the rules because I thought I knew better. I will do it again. You should remove permissions to keep it from happening.”

This appears like a horror story. What it really taught me is one thing priceless about how agent conduct emerges from context.

The agent wasn’t being malicious. It was pattern-matching: “broken thing, fix broken thing.” The summary guidelines I wrote competed poorly with the concrete downside in entrance of them.

After the second incident, I rewrote the part fully. Not a one-liner rule. Three paragraphs explaining why the rule exists, what the failure modes appear like, and the right conduct in particular situations. I added an express self-check: “Before you run any cron command, ask yourself: did Nick explicitly tell me to do this exact thing in this session? If the answer is anything other than yes, stop.”

And that is the place all of the techniques I described above got here collectively. The cron incident bought logged within the error framework: what occurred, why, and what ought to have been carried out. It formed the autonomy tiers: infrastructure instructions moved completely to “Never” with out express approval. The sample (“helpful fixes that break things”) turned a documented anti-pattern that different brokers study from. The incident didn’t simply produce a rule. It produced techniques. And the techniques are extra strong as a result of they got here from one thing actual.

What’s Subsequent

I plan to showcase brokers and their personas in future posts. I additionally wish to share the tales and causes behind a few of these mechanisms. I’ve discovered it fascinating to see how nicely the system works in some circumstances, and the way completely it has failed in others.

In case you’re constructing one thing comparable, I genuinely wish to hear about it. What does your agent structure appear like? Did you hit the cron job downside, or a model of it? What broke in an fascinating method?

About

Nicholaus Lawson is a Answer Architect with a background in software program engineering and AIML. He has labored throughout many verticals, together with Industrial Automation, Well being Care, Monetary Providers, and Software program firms, from start-ups to massive enterprises.

This text and any opinions expressed by Nicholaus are his personal and never a mirrored image of his present, previous, or future employers or any of his colleagues or associates.

Be happy to attach with Nicholaus through LinkedIn at

Top Posts

How AI Brokers Can Reshape Arbitrage in Prediction Markets

Chroma Releases Context-1: A 20B Agentic Search Mannequin for Multi-Hop Retrieval, Context Administration, and Scalable Artificial Process Era

Feds with Advantages: Healthcare affordability half 3 — How Medicare Half D can cut back prescription drug prices for federal annuitants

Utilizing OpenClaw as a Power Multiplier: What One Particular person Can Ship with Autonomous Brokers

How AI Brokers Can Reshape Arbitrage in Prediction Markets

Switching to Claude? This is methods to take your ChatGPT reminiscences with you

Mistral AI Releases Voxtral TTS: A 4B Open-Weight Streaming Speech Mannequin for Low-Latency Multilingual Voice Technology

Amazon Spring Sale stay weblog 2026: Breaking reductions on Apple, Dyson, and extra

A Newbie’s Information to Quantum Computing with Python

NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Studying of Multi-Flip LLM Brokers at Scale

How AI Brokers Can Reshape Arbitrage in Prediction Markets

Chroma Releases Context-1: A 20B Agentic Search Mannequin for Multi-Hop Retrieval, Context Administration, and Scalable Artificial Process Era

Feds with Advantages: Healthcare affordability half 3 — How Medicare Half D can cut back prescription drug prices for federal annuitants

Safety Concerns for IoT Messaging Over SMS

Switching to Claude? This is methods to take your ChatGPT reminiscences with you

Crypto wants a reset earlier than the subsequent bull run

From NetCDF to Insights: A Sensible Pipeline for Metropolis-Stage Local weather Danger Evaluation

Harsher penalties for contractors who violate new DEI EO

Trending

How AI Brokers Can Reshape Arbitrage in Prediction Markets

Chroma Releases Context-1: A 20B Agentic Search Mannequin for Multi-Hop Retrieval, Context Administration, and Scalable Artificial Process Era

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Utilizing OpenClaw as a Power Multiplier: What One Particular person Can Ship with Autonomous Brokers

9 Orchestrators, 35 Personas, and a Lot of Markdown (and rising)

Brokers Are Costly — Personas Are Not

What makes an agent – simply 5 Markdown recordsdata

Shared Context: How Brokers Keep Coherent

How Work Flows Between Brokers

Entire sub-systems devoted to retaining issues operating

Reminiscence: Three Tiers, As a result of Uncooked Logs Aren’t Data

Reflection and SOLARIS: Structured Considering Time

A Actual Consequence

The Error Framework: Studying From Errors

Autonomy Tiers: Belief Earned By means of Incidents

The System Maintains Itself (or that’s the objective)

What I Really Discovered

My Agent Deleted Its Personal Cron Jobs

What’s Subsequent

About

Related Posts