AI & Machine Learning

Multi-Agent AI Systems: Architecture and Implementation

Uvin Vindula·February 19, 2024·13 min read

TL;DR

Multi-agent AI systems split complex tasks across specialized agents instead of forcing one model to do everything. There are three core architecture patterns: orchestrator (one agent delegates to specialists), pipeline (agents process sequentially), and swarm (agents collaborate dynamically). I built IAMUVIN OS — a private multi-business command center — using Claude AI agents in an orchestrator pattern, and I use multi-agent workflows daily through Claude Code. The honest truth: most projects do not need multi-agent. A single well-prompted agent handles 80% of use cases. Multi-agent adds value when you have genuinely distinct domains of expertise, when tasks benefit from parallel execution, or when you need agents to check each other's work. This article covers the architecture patterns, TypeScript implementation, communication protocols, error recovery, and specific guidance on when multi-agent is worth the complexity.

What Multi-Agent Actually Means

The term "multi-agent" gets thrown around loosely. Let me be precise about what it means in practice, because I see teams overcomplicating this constantly.

A multi-agent AI system is a set of LLM-powered processes that communicate with each other to accomplish a goal that no single agent handles well alone. Each agent has a defined role, a scoped set of tools, and a communication protocol for passing context between agents.

That is it. No blockchain consensus. No emergent intelligence. Just specialized LLM calls coordinated by code you write.

Here is the mental model I use: think of agents like microservices. A monolithic application (single agent) works fine until your responsibilities grow complex enough that separation improves reliability, testability, and performance. The same threshold applies to AI agents.

When I built IAMUVIN OS — the system I use daily to manage multiple businesses, projects, and client work — I started with a single Claude agent handling everything. It worked for a while. Then I needed it to handle financial analysis, code review, content generation, and client communication in the same session. Context windows got polluted. The agent would confuse a client brief with a code review. Quality dropped.

The fix was not a bigger model or a longer context window. It was separation of concerns — the same principle that makes good software architecture.

Single Agent (monolith):
┌──────────────────────────────────┐
│  One agent handles everything    │
│  - Code review                   │
│  - Content writing               │
│  - Financial analysis            │
│  - Client communication          │
│  Context pollution inevitable    │
└──────────────────────────────────┘

Multi-Agent (separated):
┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
│  Code    │  │ Content  │  │ Finance  │  │  Client  │
│  Review  │  │  Writer  │  │ Analyst  │  │  Comms   │
│  Agent   │  │  Agent   │  │  Agent   │  │  Agent   │
└────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘
     └──────────┬──┴──────────┬──┴──────────────┘
          ┌─────┴─────┐
          │Orchestrator│
          └───────────┘

The key insight: each agent gets its own system prompt, its own tool set, and only the context it needs. The code review agent never sees financial data. The content writer never gets confused by TypeScript diffs. Isolation improves quality.

Architecture Patterns — Orchestrator, Pipeline, Swarm

There are three patterns I have used in production. Each solves a different coordination problem.

Pattern 1: Orchestrator

One central agent receives the user request, decides which specialist agent to invoke, collects results, and synthesizes the final response. This is the pattern I use in IAMUVIN OS and the one I recommend for most teams starting out.

typescript

interface Agent {
  name: string;
  systemPrompt: string;
  tools: Tool[];
  model: string;
}

interface OrchestratorResult {
  agent: string;
  response: string;
  confidence: number;
}

async function orchestrate(
  userMessage: string,
  agents: Agent[]
): Promise<string> {
  // Step 1: Route to the right specialist
  const routing = await callLLM({
    system: `You are a routing agent. Given a user message, decide which 
specialist agent should handle it. Available agents: 
${agents.map(a => `- ${a.name}: ${a.systemPrompt.slice(0, 100)}`).join('\n')}
Respond with JSON: { "agent": "name", "reason": "why" }`,
    message: userMessage,
    model: 'claude-sonnet-4-20250514',
  });

  const { agent: selectedName } = JSON.parse(routing);
  const selected = agents.find(a => a.name === selectedName);

  if (!selected) {
    throw new AgentRoutingError(`No agent found: ${selectedName}`);
  }

  // Step 2: Execute with the specialist
  const result = await callLLM({
    system: selected.systemPrompt,
    message: userMessage,
    tools: selected.tools,
    model: selected.model,
  });

  return result;
}

When to use orchestrator: You have 3-10 distinct specialist domains. User requests are clearly categorizable. You want centralized control over routing logic.

Trade-off: The orchestrator is a single point of failure. If it misroutes, the specialist gets a task it cannot handle. I mitigate this by having each specialist validate that the request matches its domain before executing.

Pattern 2: Pipeline

Agents process sequentially, each one transforming the output before passing it to the next. Think of it like Unix pipes — each stage does one thing well.

typescript

interface PipelineStage {
  agent: Agent;
  transform: (input: string) => string;
  validate: (output: string) => boolean;
}

async function executePipeline(
  input: string,
  stages: PipelineStage[]
): Promise<string> {
  let current = input;

  for (const stage of stages) {
    const transformed = stage.transform(current);

    const result = await callLLM({
      system: stage.agent.systemPrompt,
      message: transformed,
      tools: stage.agent.tools,
      model: stage.agent.model,
    });

    if (!stage.validate(result)) {
      throw new PipelineValidationError(
        `Stage "${stage.agent.name}" produced invalid output`
      );
    }

    current = result;
  }

  return current;
}

// Example: Content pipeline
const contentPipeline: PipelineStage[] = [
  {
    agent: researchAgent,
    transform: (topic) => `Research this topic thoroughly: ${topic}`,
    validate: (output) => output.length > 500,
  },
  {
    agent: writerAgent,
    transform: (research) => `Write an article based on: ${research}`,
    validate: (output) => output.includes('##'),
  },
  {
    agent: editorAgent,
    transform: (draft) => `Edit for clarity and accuracy: ${draft}`,
    validate: (output) => output.length > 0,
  },
];

When to use pipeline: Tasks naturally decompose into sequential steps. Each step has clear input/output contracts. You want easy debugging — you can inspect the state between any two stages.

Trade-off: Latency scales linearly with the number of stages. A 5-stage pipeline with 3-second LLM calls takes 15 seconds minimum. No parallelism.

Pattern 3: Swarm

Multiple agents work simultaneously on subtasks, then their outputs are aggregated. This is the most complex pattern and the one most often implemented badly.

typescript

interface SwarmTask {
  agent: Agent;
  subtask: string;
}

interface SwarmConfig {
  maxConcurrency: number;
  timeout: number;
  aggregator: Agent;
}

async function executeSwarm(
  tasks: SwarmTask[],
  config: SwarmConfig
): Promise<string> {
  const { maxConcurrency, timeout, aggregator } = config;

  // Execute subtasks with controlled concurrency
  const results = await pMap(
    tasks,
    async (task) => {
      const controller = new AbortController();
      const timer = setTimeout(
        () => controller.abort(),
        timeout
      );

      try {
        const result = await callLLM({
          system: task.agent.systemPrompt,
          message: task.subtask,
          tools: task.agent.tools,
          model: task.agent.model,
          signal: controller.signal,
        });
        return { agent: task.agent.name, result, status: 'success' as const };
      } catch (error) {
        return { agent: task.agent.name, result: '', status: 'failed' as const };
      } finally {
        clearTimeout(timer);
      }
    },
    { concurrency: maxConcurrency }
  );

  // Aggregate results
  const successful = results.filter(r => r.status === 'success');
  const aggregated = await callLLM({
    system: aggregator.systemPrompt,
    message: `Synthesize these results:\n${successful
      .map(r => `[${r.agent}]: ${r.result}`)
      .join('\n\n')}`,
    model: aggregator.model,
  });

  return aggregated;
}

When to use swarm: Subtasks are genuinely independent. Speed matters more than sequential reasoning. You have a reliable aggregation strategy. I use this for code review — one agent checks security, another checks performance, a third checks style — then an aggregator combines findings.

Trade-off: Aggregation is the hard part. Combining outputs from 5 agents without losing information or creating contradictions requires a capable aggregator model. I always use Claude Opus for the aggregator, even if the worker agents run on Sonnet.

Building Your First Multi-Agent System

If you are reading this and want to build a multi-agent system, start with the orchestrator pattern. Here is a complete, minimal implementation you can run today.

typescript

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

interface AgentConfig {
  name: string;
  description: string;
  systemPrompt: string;
  model: string;
}

const agents: AgentConfig[] = [
  {
    name: 'code-analyst',
    description: 'Analyzes code for bugs, performance, and architecture',
    systemPrompt: `You are a senior code analyst. Review code for:
1. Bugs and logic errors
2. Performance bottlenecks
3. Architecture concerns
Be specific. Reference line numbers. Suggest fixes with code.`,
    model: 'claude-sonnet-4-20250514',
  },
  {
    name: 'docs-writer',
    description: 'Writes technical documentation and API references',
    systemPrompt: `You are a technical writer. Create clear documentation.
Use concise language. Include code examples. Structure with headers.
Target audience: experienced developers who want facts, not fluff.`,
    model: 'claude-sonnet-4-20250514',
  },
  {
    name: 'security-auditor',
    description: 'Audits code for security vulnerabilities',
    systemPrompt: `You are a security auditor. Check for:
1. OWASP Top 10 vulnerabilities
2. Input validation gaps
3. Authentication/authorization flaws
4. Data exposure risks
Rate each finding: Critical / High / Medium / Low.`,
    model: 'claude-sonnet-4-20250514',
  },
];

async function routeToAgent(userMessage: string): Promise<string> {
  const routingResponse = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 200,
    system: `Route the user message to the best agent. 
Available: ${agents.map(a => `${a.name} (${a.description})`).join(', ')}
Respond with ONLY the agent name, nothing else.`,
    messages: [{ role: 'user', content: userMessage }],
  });

  const agentName = routingResponse.content[0].type === 'text'
    ? routingResponse.content[0].text.trim()
    : '';

  const agent = agents.find(a => a.name === agentName);

  if (!agent) {
    // Fallback: use the first agent or handle gracefully
    return executeAgent(agents[0], userMessage);
  }

  return executeAgent(agent, userMessage);
}

async function executeAgent(
  agent: AgentConfig,
  message: string
): Promise<string> {
  const response = await anthropic.messages.create({
    model: agent.model,
    max_tokens: 4096,
    system: agent.systemPrompt,
    messages: [{ role: 'user', content: message }],
  });

  return response.content[0].type === 'text'
    ? response.content[0].text
    : '';
}

This is roughly 80 lines of code. It gives you a working multi-agent system with routing. No frameworks. No abstractions you do not understand. Start here, then add complexity only when you hit a real limitation.

Communication Between Agents

The hardest problem in multi-agent systems is not getting individual agents to work — it is getting them to communicate effectively. I have tried three approaches and settled on structured message passing.

Approach 1: Raw Text Passing (Do Not Do This)

Passing the full text output of one agent as input to another sounds simple. In practice, it causes context bloat, hallucination amplification, and loss of structured data. If Agent A hallucinates a fact, Agent B treats it as ground truth and builds on it.

Approach 2: Structured Message Protocol (What I Use)

Define a typed message format that agents produce and consume. This gives you validation boundaries between agents.

typescript

interface AgentMessage {
  from: string;
  to: string;
  type: 'request' | 'response' | 'error' | 'handoff';
  payload: {
    task: string;
    context: Record<string, unknown>;
    constraints: string[];
  };
  metadata: {
    timestamp: number;
    requestId: string;
    parentRequestId?: string;
  };
}

function validateMessage(msg: AgentMessage): boolean {
  if (!msg.from || !msg.to) return false;
  if (!msg.payload.task) return false;
  if (!msg.metadata.requestId) return false;
  return true;
}

async function sendAgentMessage(
  msg: AgentMessage,
  targetAgent: AgentConfig
): Promise<AgentMessage> {
  const response = await anthropic.messages.create({
    model: targetAgent.model,
    max_tokens: 4096,
    system: `${targetAgent.systemPrompt}

You will receive structured task messages. Always respond in this JSON format:
{
  "task": "description of what you did",
  "result": "your output",
  "confidence": 0.0-1.0,
  "warnings": ["any concerns"]
}`,
    messages: [{
      role: 'user',
      content: JSON.stringify(msg.payload),
    }],
  });

  const text = response.content[0].type === 'text'
    ? response.content[0].text
    : '';

  return {
    from: targetAgent.name,
    to: msg.from,
    type: 'response',
    payload: {
      task: msg.payload.task,
      context: JSON.parse(text),
      constraints: [],
    },
    metadata: {
      timestamp: Date.now(),
      requestId: crypto.randomUUID(),
      parentRequestId: msg.metadata.requestId,
    },
  };
}

Approach 3: Shared State Store

For swarm patterns where multiple agents need to read and write shared context, I use a simple in-memory store with read/write locks. In production IAMUVIN OS, this is backed by a Supabase table so state survives process restarts.

typescript

interface SharedState {
  data: Map<string, unknown>;
  locks: Map<string, string>;
}

class AgentStateStore {
  private state: SharedState = {
    data: new Map(),
    locks: new Map(),
  };

  async read(key: string): Promise<unknown> {
    return this.state.data.get(key);
  }

  async write(
    key: string,
    value: unknown,
    agentId: string
  ): Promise<boolean> {
    const lock = this.state.locks.get(key);
    if (lock && lock !== agentId) {
      return false; // Another agent holds the lock
    }
    this.state.locks.set(key, agentId);
    this.state.data.set(key, value);
    this.state.locks.delete(key);
    return true;
  }
}

The message protocol approach scales better for most use cases. Shared state introduces coordination complexity that mirrors the distributed systems problems you already know from backend engineering — race conditions, stale reads, deadlocks. Only use it when agents genuinely need to collaborate on the same data.

Error Recovery and Fallbacks

Multi-agent systems fail in ways single agents do not. An agent can time out. The orchestrator can misroute. Two agents can produce contradictory results. You need to plan for all of these.

Here is the error recovery hierarchy I use:

typescript

interface RetryConfig {
  maxRetries: number;
  backoffMs: number;
  fallbackAgent?: AgentConfig;
}

async function executeWithRecovery(
  agent: AgentConfig,
  message: string,
  config: RetryConfig
): Promise<string> {
  let lastError: Error | null = null;

  // Retry with the primary agent
  for (let attempt = 0; attempt < config.maxRetries; attempt++) {
    try {
      const result = await executeAgent(agent, message);

      // Validate the output is not empty or malformed
      if (!result || result.length < 10) {
        throw new Error('Agent returned empty or trivial response');
      }

      return result;
    } catch (error) {
      lastError = error as Error;
      await sleep(config.backoffMs * Math.pow(2, attempt));
    }
  }

  // Fallback to a different agent
  if (config.fallbackAgent) {
    try {
      return await executeAgent(config.fallbackAgent, message);
    } catch (fallbackError) {
      // Both agents failed
    }
  }

  // Final fallback: return a structured error
  throw new AgentExecutionError(
    `Agent "${agent.name}" failed after ${config.maxRetries} retries`,
    lastError
  );
}

Three specific failure modes I have encountered in production:

Routing loops. The orchestrator routes to Agent A, which determines the task is outside its scope and asks the orchestrator to reroute. The orchestrator sends it to Agent A again. Fix: track routing history and never route to the same agent twice for the same request.

Context window exhaustion. In a pipeline, each stage adds output without removing input. By stage 4, you are sending 30,000 tokens of accumulated context. Fix: each stage should summarize its input before processing, or extract only the fields it needs.

Confidence cascading. Agent A returns a low-confidence result. Agent B receives it without the confidence score, treats it as fact, and builds a high-confidence answer on a shaky foundation. Fix: always propagate confidence scores through the message protocol.

When NOT to Use Multi-Agent

I want to be direct about this because the industry is drowning in multi-agent hype. Here are situations where a single agent is the better choice:

Do not use multi-agent when:

Your task fits in one prompt. If a single system prompt with good examples handles the job, adding an orchestrator just adds latency and cost.
You have fewer than three distinct domains. Two agents plus an orchestrator is three LLM calls for what one agent does in one call.
Latency matters more than quality. Every additional agent adds 1-5 seconds. For real-time chat, that kills user experience.
You cannot define clear boundaries between agents. If you struggle to describe what each agent is responsible for, you are forcing a separation that does not exist.
Your team cannot debug distributed systems. Multi-agent systems fail in non-obvious ways. If your team has never debugged a microservices architecture, start with a single agent.

Do use multi-agent when:

Context pollution is degrading quality. The clearest signal. When one agent starts confusing tasks from different domains, split them.
You need agents to verify each other's work. A writer agent plus a fact-checker agent catches errors that a single agent misses.
Different tasks need different models. Maybe your code agent needs Opus for reasoning, but your summarizer works fine on Haiku. Multi-agent lets you optimize cost per task.
Parallel execution provides meaningful speedup. If you have 5 independent subtasks and running them concurrently cuts response time from 15 seconds to 4, that is a real win.

The decision framework is simple: start with one agent. Add agents only when you hit a specific, measurable problem that separation solves. Document why you split. I have seen teams build 12-agent systems that perform worse than a single well-prompted Claude Opus call.

My Production Setup

Here is what IAMUVIN OS actually looks like in practice. This is not theoretical — this is what runs my development and consulting work every day.

Architecture: Orchestrator pattern with 5 specialist agents.

┌─────────────────────────────────────────┐
│           IAMUVIN OS Orchestrator       │
│         (Claude Opus — routing)         │
└──────────┬────────┬────────┬────────────┘
           │        │        │
     ┌─────┴──┐ ┌───┴───┐ ┌─┴────────┐
     │ Code   │ │Content│ │ Business │
     │ Agent  │ │ Agent │ │  Agent   │
     │(Opus)  │ │(Sonnet│ │ (Opus)   │
     └────────┘ └───────┘ └──────────┘
           │        │
     ┌─────┴──┐ ┌───┴───┐
     │Research│ │Comms   │
     │ Agent  │ │ Agent  │
     │(Sonnet)│ │(Sonnet)│
     └────────┘ └───────┘

Agent responsibilities:

Code Agent (Opus): Reviews pull requests, suggests architecture decisions, debugs production issues. Needs Opus for complex reasoning across large codebases.
Content Agent (Sonnet): Drafts articles, social posts, documentation. Sonnet handles this well and costs less per task.
Business Agent (Opus): Financial analysis, project scoping, proposal generation. Needs Opus for nuanced business reasoning.
Research Agent (Sonnet): Gathers information, summarizes papers, competitive analysis. Speed matters more than depth.
Comms Agent (Sonnet): Drafts emails, client updates, meeting summaries. Template-driven work that Sonnet excels at.

What I learned building this:

Model selection per agent matters more than you think. Switching the research agent from Opus to Sonnet cut costs by 60% with no measurable quality loss. The code agent on Sonnet, however, missed subtle bugs that Opus catches.

The orchestrator needs to be your best model. A misrouted request wastes the entire call chain. I tried running the orchestrator on Haiku for cost savings. Bad idea — it routed security questions to the content agent.

Logging every agent interaction is non-negotiable. When something goes wrong (and it will), you need to see the full chain: what the orchestrator decided, what it sent to the specialist, and what came back. I log to Supabase with request IDs linking the full chain.

Keep agent system prompts under 500 tokens. Long system prompts eat into context that agents need for the actual task. Be concise. If your system prompt is over 500 tokens, you are probably trying to make one agent do too many things.

Claude Code already does multi-agent well. Before you build a custom system, try Claude Code with custom slash commands. I use it daily with specialized skills that function as lightweight agents. For many workflows, this is enough and you avoid building infrastructure.

The total cost of running IAMUVIN OS is roughly $200-400/month in API calls, depending on volume. That is less than one hour of my consulting rate, and it saves me 15-20 hours per month. The ROI is obvious, but only because I kept the system simple — five agents, one pattern, clear boundaries.

Key Takeaways

Multi-agent systems are specialized LLM calls coordinated by code, not magic. Start with the orchestrator pattern — it handles 90% of real-world use cases.
Define strict boundaries between agents. If you cannot describe each agent's job in one sentence, your boundaries are wrong.
Use structured message passing between agents, not raw text. Type your messages. Validate at every boundary. Propagate confidence scores.
Plan for failure from day one. Implement retries, fallback agents, and routing loop detection before you deploy anything.
Start with a single agent. Only add agents when you hit a measurable problem — context pollution, quality degradation, or cost optimization needs. Most projects never need more than one.

*Last updated: February 2024*

Written by Uvin Vindula

Uvin Vindula (IAMUVIN) is a Web3 and AI engineer based in Sri Lanka and the United Kingdom. He is the author of The Rise of Bitcoin, Director of Blockchain and Software Solutions at Terra Labz, and founder of uvin.lk — Sri Lanka's Bitcoin education platform with 10,000+ learners.

For development projects: hello@iamuvin.com↗ Book a call: calendly.com/iamuvin↗

Working on a Web3 or AI project?

Let's talk↗

Uvin Vindula

Web3 and AI engineer based in Sri Lanka and the UK. Author of The Rise of Bitcoin. Director of Blockchain and Software Solutions at Terra Labz. Founder of uvin.lk — Sri Lanka's Bitcoin education platform with 10,000+ learners.

hello@iamuvin.com uvin.lk↗LinkedIn↗