So far, our agent can answer questions, stream responses, remember history, reduce context, use tools, return structured output, connect to MCP and load Agent Skills. That is already a lot. And that is exactly where the next problem starts.

It is tempting to keep adding instructions and tools to one large agent. Make it a coding assistant. Also make it a music expert. Also make it answer questions about coffee. Also give it support tools, documentation tools and internal process rules.

At some point, the agent becomes a jack of all trades. The prompt grows. The tool list grows. The model has more irrelevant instructions to ignore. And you pay for unnecessary input tokens on every request, even when the user only asks a simple question.

One practical way out is manual routing. Instead of giving one agent every responsibility, we split the system into smaller specialized agents and put a cheap intent agent in front of them. The intent agent does not answer the user. It only decides where the request should go.

The Problem with One Large Agent

A single large agent looks simple at first. You just have one entry point for everything. But the simplicity is misleading. If the same agent knows about coffee, music, support tickets, code review and internal documentation, each request carries baggage. A question about a guitar amp still pays for coffee instructions. A coffee question still carries music instructions. A small talk message still sees tool descriptions it will never need.

SingleAgent

This creates three problems:

  • More input tokens per request
  • More room for the model to pick the wrong behavior
  • More difficult prompts to maintain and test

The goal is not to create a fancy multi-agent architecture. The goal is to keep each model call focused.

The Intent Agent

The intent agent is a dispatcher. It sits at the front of the system and classifies the user request. For this example, we keep the domain intentionally small:

  • Coffee questions go to a coffee expert agent
  • Music questions go to a music expert agent
  • Everything else gets a controlled fallback

The intent agent has one strict rule:

It classifies the request. It does not answer the request.

That rule matters. If the router starts answering directly, it becomes another general-purpose assistant. Then the system has two problems instead of one.

Because the intent agent only classifies, it can usually use a smaller and cheaper model than the specialist agents. You usually do not need deep reasoning just to choose a route. You need a reliable category.

Use Structured Output for the Routing Decision

Do not let the intent agent return plain text like this:

This is probably a music question.

That would force your application to parse generated text. String parsing is a weak boundary. We already looked at this in the structured output article.

For routing, define a small C# contract instead:

public enum UserIntent
{
    Coffee,
    Music,
    Other
}

public sealed class IntentResult
{
    public UserIntent Intent { get; set; }
    public double Confidence { get; set; }
    public string Reason { get; set; } = string.Empty;
}

Now the router returns data your application can use directly.

using Microsoft.Agents.AI;

AIAgent intentAgent = smallChatClient.AsAIAgent(
    name: "intent-router",
    instructions: """
    You classify user requests for routing.

    Return Coffee when the user asks about brewing methods, beans, grind size,
    ratios, extraction, espresso or coffee gear.

    Return Music when the user asks about songs, albums, artists, instruments,
    music theory, or tone recommendations.

    Return Other when the request does not clearly belong to Coffee or Music.

    Do not answer the user's question.
    Only classify the request.
    """);

Then call it with RunAsync<T>:

string userMessage = "How do I get a dirty Hendrix tone on my Strat?";

AgentResponse<IntentResult> intentResponse =
    await intentAgent.RunAsync<IntentResult>(
        $"""
        Classify this request.

        User request:
        {userMessage}
        """);

IntentResult route = intentResponse.Result;

The useful part is the boundary. Your application does not receive a sentence that it still has to interpret. It receives an IntentResult.

Structured output support still depends on the agent type, provider, model and chat client. With ChatClientAgent and compatible chat clients, RunAsync<T> is the cleanest option when the output type is known at compile time. If your provider does not support this reliably, use an explicit JSON schema via response format or add a retry and validation layer.

C# Takes the Wheel

Once you have an IntentResult, stop asking the model to orchestrate. Use normal C#.

IntentAgent

AIAgent coffeeAgent = coffeeChatClient.AsAIAgent(
    name: "coffee-expert",
    instructions: """
    You answer coffee questions.
    Be practical and specific about brewing, beans, ratios and equipment.
    """);

AIAgent musicAgent = musicChatClient.AsAIAgent(
    name: "music-expert",
    instructions: """
    You answer music questions.
    Be practical and specific about instruments, tone, artists and recordings.
    """);

string finalAnswer = route.Intent switch
{
    UserIntent.Coffee =>
        (await coffeeAgent.RunAsync(userMessage)).Text,

    UserIntent.Music =>
        (await musicAgent.RunAsync(userMessage)).Text,

    _ =>
        "I can help with coffee or music questions. Please rephrase the request."
};

This is the core pattern:

The intent agent classifies. C# routes. The specialist agent answers.

For Other, no second model call is required. You can return a fixed message, ask a clarification question, show supported topics, or route to a generic fallback agent if that makes sense for your application.

The important point is control. The model does not decide which expensive agent gets called. Your application does.

Add Confidence Before Routing Expensive Work

Do not blindly trust the router. The router is still an LLM call. It can be wrong. It can be uncertain. It can over-classify vague requests.

That is why the IntentResult includes a confidence score. But treat that confidence as a routing signal, not as truth.

Model-generated confidence is not automatically calibrated. A result with 0.9 does not necessarily mean the route is correct 90% of the time. It only means the router expressed high confidence.

You can still use it as a practical gate:

if (route.Confidence < 0.75)
{
    return "Is this about coffee or music?";
}

string finalAnswer = route.Intent switch
{
    UserIntent.Coffee =>
        (await coffeeAgent.RunAsync(userMessage)).Text,

    UserIntent.Music =>
        (await musicAgent.RunAsync(userMessage)).Text,

    _ =>
        "I can help with coffee or music questions."
};

The exact threshold depends on your application. For a casual assistant, a wrong route may not matter much. For support automation, routing errors can waste time or trigger the wrong downstream process.

Measure this with real examples. Do not tune the threshold from intuition alone.

Create a small labeled dataset of representative user requests. Run the intent agent against it. Track how often each intent is classified correctly. Then decide where the confidence threshold should sit.

Confidence

Confidence is useful. But only after you have checked how it behaves in your actual domain.

Make Routing Observable

Manual routing also gives you a clean evaluation point. Because the router returns a typed result before any specialist agent runs, you can log the routing decision separately from the final answer. For example:

logger.LogInformation(
    "Intent routed. Intent={Intent}, Confidence={Confidence}, SelectedAgent={SelectedAgent}, FallbackUsed={FallbackUsed}",
    route.Intent,
    route.Confidence,
    selectedAgent,
    fallbackUsed);

Useful fields include:

  • userMessage
  • predictedIntent
  • confidence
  • selectedAgent
  • fallbackUsed

In a real system, you may not want to log the full user message directly. Depending on your privacy and compliance requirements, you might log a request id, a redacted message or a hashed reference instead.

The important part is that routing becomes measurable. You can now answer questions like:

  • Which intents are confused most often?
  • How often does the fallback trigger?
  • Which confidence ranges produce the most wrong routes?
  • Are some user request types consistently misclassified?
  • Did a model upgrade improve or damage routing quality?

This is one of the underrated benefits of manual routing. You are not only saving tokens. You are creating a small, testable control point in front of the rest of the system.

Why This Saves Tokens

WhyRoutingWorks

Manual routing helps because each agent receives only the context it needs. But keep in mind, routing is not free. It adds one model call before the specialist call.

This only pays off when the routing call is cheaper than the irrelevant context you avoid. If your specialist prompts are tiny, the extra router call may not be worth it. But once prompts, tools and model sizes start to diverge, routing becomes useful quickly.

The intent agent can stay small:

  • short instructions
  • no domain tools
  • no long specialist prompt
  • cheap model
  • structured output only

The specialist agents can stay focused:

  • coffee instructions only for coffee questions
  • music instructions only for music questions
  • domain tools only where they are useful
  • stronger models only when the request deserves them

This does not make token limits disappear. It changes which instructions, tools and context are sent to which model call.

Instead of this:

Every request
  -> coffee prompt + music prompt + all tools + all rules

You get this:

Every request
  -> small routing prompt

Only music requests
  -> music prompt + music tools

Only coffee requests
  -> coffee prompt + coffee tools

That is the real win. You avoid paying for irrelevant context on every request.

Manual Routing vs. Workflow Engine

This article uses plain C# routing on purpose. You could model routing as a workflow later. Agent Framework has a workflow engine for explicit orchestration, checkpoints, handoffs and human-in-the-loop scenarios. But this example does not need that yet. The flow is simple:

  1. Classify intent
  2. Switch on the result
  3. Call one specialist
  4. Return the answer

A C# switch is easier to read, easier to test and easier to debug than a workflow graph for this case. This is a useful design rule:

Use the smallest orchestration mechanism that gives you enough control.

For simple routing, that is often normal C#.

When to Use Manual Intent Routing

Use manual intent routing when:

  • You have clearly separated domains
  • One large prompt is becoming too expensive or unfocused
  • Different requests need different tools
  • Different requests deserve different model sizes
  • You want predictable routing logic in application code
  • You can evaluate routing quality with realistic examples

Do not use manual intent routing when:

  • A single small agent is already enough
  • The categories are vague and constantly overlapping
  • A wrong route would be expensive and you have no validation
  • The router becomes as complex as the system it replaces
  • You actually need checkpoints, long-running execution or human approval

This pattern is not a universal architecture. It is a cheap and practical first step into multi-agent systems.

Conclusion

Manual multi-agent routing is simple. Use a small intent agent to classify the request. Return a typed IntentResult. Let C# route to the right specialist agent.

This keeps the expensive agents focused and avoids sending every instruction and every tool to every model call. It also gives your application a clean control point for fallbacks, confidence thresholds, logging and tests.

The main limitation is that routing quality becomes part of your system quality. You need examples, thresholds and fallback behavior. But that is still easier to reason about than one overloaded agent that tries to be everything at once.

Next, we can take this idea further. Instead of routing to agents from C#, we can expose agents as tools and let one coordinator delegate work deliberately. That gives more flexibility, but also brings back token, cost and control tradeoffs.

Further Reading