Agent Framework RAG for Agents: Giving Your Agent the Right Context

Table of Contents

In the previous article, we looked at workflows. Workflows make sense when the process itself needs structure: state, checkpoints, events, human approvals, and resumable execution.

This post is the bridge from Agent Framework into RAG. I plan on doing a full RAG deep dive sometime later. The practical question for now is smaller:

How do I connect an Agent Framework agent to private application knowledge without stuffing every document into the prompt?

For agents, RAG is less about adding more text and more about giving the agent a controlled retrieval path. The agent should fetch the right context at the point where it needs it.

Agents do not know your private data

Your company documents, product catalog, tickets, rules, policies, runbooks, and internal knowledge base live outside the model. The model has generic knowledge. Your application has private knowledge. Treat those as separate systems.

You can paste some private data into the prompt, and for a demo that may be enough. But this falls apart quickly:

full documents are expensive to send repeatedly
long prompts are fragile
stale documents may sit next to current ones
users may not be allowed to see every source
long context still needs selection

The last point is easy to underestimate. A larger context window lets you send more text. It does not decide which text is correct, current, relevant, or permitted.

Do not give the agent all knowledge. Give it the right context at the moment it needs it.

Retrieval owns that job.

The minimal RAG shape

The basic RAG loop is small:

user question
-> retrieve relevant chunks
-> pass chunks to the agent
-> agent answers using that context

For documents, the longer pipeline usually looks like this:

documents
-> chunks
-> embeddings
-> vector store
-> search
-> retrieved context
-> agent response

Documents are split into smaller chunks. Those chunks are embedded into vectors. The vectors and source metadata are stored. When a user asks a question, the question is embedded too. The search layer finds nearby chunks and returns only those chunks to the agent.

Stop there for now.

There are some hard parts here: chunk boundaries, embedding model choice, hybrid search, reranking, freshness, access control, observability, and evals. They are just not the point yet.

For now, keep the boundary clear:

RAG is the retrieval layer around the agent. The agent is not the retrieval layer.

Agent Framework is not the RAG engine

Microsoft Agent Framework gives you the agent runtime. It does not give you a finished ingestion pipeline, chunking strategy, embedding setup, vector store, ranking model, permission model, freshness process, or retrieval eval suite.

Agent Framework helps you decide how the agent receives and uses context:

you can retrieve context before calling the agent
you can inject retrieved context through an AI context provider
you can expose retrieval as a function tool
you can make retrieval one step in a workflow

The retrieval system still belongs to your application architecture.

It might use Azure AI Search, PostgreSQL with pgvector, SQL Server vector search, Cosmos DB, Qdrant, Redis, a normal search index, or an internal HTTP API. The agent does not need to care.

The agent needs a focused capability. Not direct database access.

Retrieval as an agent tool

For many agent apps, I would start by exposing retrieval as a tool.

The tool is narrow:

SearchKnowledgeAsync(
    string query,
    string? category,
    int limit)

The agent can call it when the answer depends on private knowledge. Your application decides what the tool is allowed to search.

This matches the tool-design rule from earlier in the series:

Tools should expose controlled capabilities, not raw infrastructure.

A small version looks like this:

using System.ComponentModel;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;

public sealed record KnowledgeSearchResult(
    string Title,
    string Source,
    string Snippet,
    double Score);

public interface IKnowledgeSearch
{
    Task<IReadOnlyList<KnowledgeSearchResult>> SearchAsync(
        string query,
        string? category,
        int limit,
        CancellationToken cancellationToken);
}

[Description("Searches approved internal knowledge articles, policies, and runbooks.")]
public static Task<IReadOnlyList<KnowledgeSearchResult>> SearchKnowledgeAsync(
    [Description("Focused search query. Rewrite the user's message into search terms.")]
    string query,
    [Description("Optional source category such as policy, runbook, product, support, or architecture.")]
    string? category,
    [Description("Maximum number of results to return. Use 3 to 5 for normal questions.")]
    int limit,
    IServiceProvider services,
    CancellationToken cancellationToken)
{
    var search = services.GetRequiredService<IKnowledgeSearch>();

    return search.SearchAsync(
        query,
        category,
        Math.Clamp(limit, 1, 5),
        cancellationToken);
}

The model supplies query, category, and limit. The application supplies IKnowledgeSearch.

Keep that split.

The model can ask for a search. It does not get a connection string, a database client, or permission to browse every source.

Then attach the tool to the agent:

AIAgent supportAgent = chatClient.AsAIAgent(
    instructions: """
    You answer questions about the internal engineering platform.

    Use SearchKnowledgeAsync when the answer depends on private company
    documentation, runbooks, policies, known issues, or product rules.

    If the search results do not contain enough evidence, say that the indexed
    sources do not answer the question. Do not invent policy details, limits,
    prices, permissions, or operational steps.
    """,
    tools: [AIFunctionFactory.Create(SearchKnowledgeAsync)],
    services: app.Services);

The agent-side RAG flow is:

The user asks a question.
The agent decides it needs private knowledge.
The agent calls the retrieval tool with a focused query.
The application searches the allowed sources.
The agent receives a few results and answers from them.

At that point, retrieval is just another tool. The pattern fits Agent Framework because tools already give you that controlled application boundary.

The user message is not always the search query

Users ask messy questions.

For example:

What were the most important changes in our cancellation policy last year?

A better retrieval query might be:

cancellation policy changes last year

Or, if you expose metadata filters:

await SearchKnowledgeAsync(
    query: "cancellation policy changes last year",
    category: "policy",
    limit: 5,
    services,
    cancellationToken);

The agent can help here. It can translate a conversational request into a smaller retrieval query.

But do not overcomplicate this too early. Start by logging the generated tool query and checking whether it actually finds better results than the raw user message.

Bad query rewriting is worse than no query rewriting. It can remove the term that mattered.

Metadata filters keep retrieval inside the boundary

Vector similarity finds related text. It does not know whether that text belongs to the right tenant, product, language, version, source system, or user permission scope.

You often need filters.

Common filters include:

tenant
user permissions
document type
product
category
language
date
version
source system

Some filters can be model supplied. category is a reasonable example because the model can often infer whether a question is about a policy, runbook, product, or support article.

Some filters should not be model supplied.

Tenant, user ID, role, entitlement, and document permissions should come from your authenticated application context. The model should not be allowed to say:

Search tenant = admin

and suddenly see admin-only documents.

A better application boundary looks like this:

public interface IKnowledgeSearch
{
    Task<IReadOnlyList<KnowledgeSearchResult>> SearchAsync(
        string query,
        string? category,
        int limit,
        UserKnowledgeScope scope,
        CancellationToken cancellationToken);
}

The tool can accept the search query and category. Your application adds UserKnowledgeScope from the current user.

Similarity search finds related text. Metadata filters keep the search inside the right boundary.

Manual retrieval is still valid

Exposing retrieval as a tool is not the only option.

For a pure documentation assistant, you may not want the model to decide whether to search. You may want retrieval on every request.

Plain application code is enough:

IReadOnlyList<KnowledgeSearchResult> results =
    await knowledgeSearch.SearchAsync(
        query: userQuestion,
        category: null,
        limit: 5,
        cancellationToken);

string context = string.Join(
    "\n\n",
    results.Select(result => $"""
    Source: {result.Title}
    {result.Snippet}
    """));

AgentResponse response = await supportAgent.RunAsync($"""
    Answer the user's question using the retrieved context.
    If the context is not enough, say so.

    Retrieved context:
    {context}

    User question:
    {userQuestion}
    """,
    cancellationToken: cancellationToken);

You can also use Agent Framework context providers, such as TextSearchProvider, when that fits your setup. The tradeoff is the same either way:

automatic retrieval is predictable
retrieval as a tool is more selective

If almost every request needs private knowledge, retrieve before the agent call. If retrieval is one capability among several, expose it as a tool.

When RAG is the wrong tool

RAG is for finding relevant context. Code is for exact operations.

If the user asks:

What are the top 5 products by revenue?

that should probably be SQL or an analytics API, not vector search.

The same applies to:

exact lookups
IDs
prices
current status
rankings
totals
permissions
deterministic business rules

Vector search is good at finding related text. It is not a calculator, database constraint, authorization system, or reporting engine.

If the answer must be exact, use normal code behind a tool.

For example:

[Description("Returns the top products by revenue for an authorized reporting period.")]
public static Task<IReadOnlyList<ProductRevenue>> GetTopProductsByRevenueAsync(
    DateOnly from,
    DateOnly to,
    int limit,
    IServiceProvider services,
    CancellationToken cancellationToken)
{
    var reporting = services.GetRequiredService<IRevenueReporting>();

    return reporting.GetTopProductsByRevenueAsync(
        from,
        to,
        Math.Clamp(limit, 1, 20),
        cancellationToken);
}

This still gives the agent a tool. It is just not RAG.

When I would use this

Use retrieval with an Agent Framework agent when:

the answer depends on private documents or records
the model’s generic knowledge is not enough
stuffing the prompt would be expensive or noisy
the agent can benefit from searching only when needed
the application can enforce source permissions behind the tool

Start with a narrow search tool. Log the query the agent sends. Log the sources returned. Check whether the answer actually used those sources.

That gives you enough signal to see where the retrieval design is weak.

When I would not use this

Do not use RAG when the task needs deterministic data access or computation.

Use normal code for current state, totals, rankings, exact IDs, prices, permissions, and business rules.

Do not use RAG as a way to bypass application boundaries. If a user cannot access a document in the product, the retrieval tool should not return it to the agent.

Also avoid building the full ingestion and retrieval platform before you have a real use case. Start with one domain, a small corpus, and a handful of questions you can verify.

Conclusion

Agent Framework gives you a clean place to put retrieval into the agent loop. It does not make RAG automatic.

The design I would carry forward is simple:

keep private knowledge outside the base prompt
expose retrieval as a focused capability
let the application enforce permissions and filters
give the agent only the context it needs
use code, not RAG, for exact operations

As I said before, I will do a deep dive into RAG later on. So in the next Agent Framework post we will move to multimodal agents: images, PDFs, and provider differences. The agent boundary gets messy there in a different way. Some providers can work with images or document inputs natively, some need different message formats, and some scenarios are still better handled by manual preprocessing before the agent sees anything.

Agents do not know your private data#

The minimal RAG shape#

Agent Framework is not the RAG engine#

Retrieval as an agent tool#

The user message is not always the search query#

Metadata filters keep retrieval inside the boundary#

Manual retrieval is still valid#

When RAG is the wrong tool#

When I would use this#

When I would not use this#

Conclusion#

Further reading#