Table of Contents
In the previous article, we looked at workflows. Workflows make sense when the process itself needs structure: state, checkpoints, events, human approvals, and resumable execution.
This post is the bridge from Agent Framework into RAG. I plan on doing a full RAG deep dive sometime later. The practical question for now is smaller:
How do I connect an Agent Framework agent to private application knowledge without stuffing every document into the prompt?
For agents, RAG is less about adding more text and more about giving the agent a controlled retrieval path. The agent should fetch the right context at the point where it needs it.
Agents do not know your private data
Your company documents, product catalog, tickets, rules, policies, runbooks, and internal knowledge base live outside the model. The model has generic knowledge. Your application has private knowledge. Treat those as separate systems.
You can paste some private data into the prompt, and for a demo that may be enough. But this falls apart quickly:
- full documents are expensive to send repeatedly
- long prompts are fragile
- stale documents may sit next to current ones
- users may not be allowed to see every source
- long context still needs selection
The last point is easy to underestimate. A larger context window lets you send more text. It does not decide which text is correct, current, relevant, or permitted.
Do not give the agent all knowledge. Give it the right context at the moment it needs it.
Retrieval owns that job.
The minimal RAG shape
The basic RAG loop is small:
user question
-> retrieve relevant chunks
-> pass chunks to the agent
-> agent answers using that context
For documents, the longer pipeline usually looks like this:
documents
-> chunks
-> embeddings
-> vector store
-> search
-> retrieved context
-> agent response
Documents are split into smaller chunks. Those chunks are embedded into vectors. The vectors and source metadata are stored. When a user asks a question, the question is embedded too. The search layer finds nearby chunks and returns only those chunks to the agent.
Stop there for now.
There are some hard parts here: chunk boundaries, embedding model choice, hybrid search, reranking, freshness, access control, observability, and evals. They are just not the point yet.
For now, keep the boundary clear:
RAG is the retrieval layer around the agent. The agent is not the retrieval layer.
Agent Framework is not the RAG engine
Microsoft Agent Framework gives you the agent runtime. It does not give you a finished ingestion pipeline, chunking strategy, embedding setup, vector store, ranking model, permission model, freshness process, or retrieval eval suite.
Agent Framework helps you decide how the agent receives and uses context:
- you can retrieve context before calling the agent
- you can inject retrieved context through an AI context provider
- you can expose retrieval as a function tool
- you can make retrieval one step in a workflow
The retrieval system still belongs to your application architecture.
It might use Azure AI Search, PostgreSQL with pgvector, SQL Server vector search, Cosmos DB, Qdrant, Redis, a normal search index, or an internal HTTP API. The agent does not need to care.
The agent needs a focused capability. Not direct database access.
Retrieval as an agent tool
For many agent apps, I would start by exposing retrieval as a tool.
The tool is narrow:
SearchKnowledgeAsync(
string query,
string? category,
int limit)
The agent can call it when the answer depends on private knowledge. Your application decides what the tool is allowed to search.
This matches the tool-design rule from earlier in the series:
Tools should expose controlled capabilities, not raw infrastructure.
A small version looks like this:
using System.ComponentModel;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
public sealed record KnowledgeSearchResult(
string Title,
string Source,
string Snippet,
double Score);
public interface IKnowledgeSearch
{
Task<IReadOnlyList<KnowledgeSearchResult>> SearchAsync(
string query,
string? category,
int limit,
CancellationToken cancellationToken);
}
[Description("Searches approved internal knowledge articles, policies, and runbooks.")]
public static Task<IReadOnlyList<KnowledgeSearchResult>> SearchKnowledgeAsync(
[Description("Focused search query. Rewrite the user's message into search terms.")]
string query,
[Description("Optional source category such as policy, runbook, product, support, or architecture.")]
string? category,
[Description("Maximum number of results to return. Use 3 to 5 for normal questions.")]
int limit,
IServiceProvider services,
CancellationToken cancellationToken)
{
var search = services.GetRequiredService<IKnowledgeSearch>();
return search.SearchAsync(
query,
category,
Math.Clamp(limit, 1, 5),
cancellationToken);
}
The model supplies query, category, and limit.
The application supplies IKnowledgeSearch.
Keep that split.
The model can ask for a search. It does not get a connection string, a database client, or permission to browse every source.
Then attach the tool to the agent:
AIAgent supportAgent = chatClient.AsAIAgent(
instructions: """
You answer questions about the internal engineering platform.
Use SearchKnowledgeAsync when the answer depends on private company
documentation, runbooks, policies, known issues, or product rules.
If the search results do not contain enough evidence, say that the indexed
sources do not answer the question. Do not invent policy details, limits,
prices, permissions, or operational steps.
""",
tools: [AIFunctionFactory.Create(SearchKnowledgeAsync)],
services: app.Services);
The agent-side RAG flow is:
- The user asks a question.
- The agent decides it needs private knowledge.
- The agent calls the retrieval tool with a focused query.
- The application searches the allowed sources.
- The agent receives a few results and answers from them.
At that point, retrieval is just another tool. The pattern fits Agent Framework because tools already give you that controlled application boundary.
The user message is not always the search query
Users ask messy questions.
For example:
What were the most important changes in our cancellation policy last year?
A better retrieval query might be:
cancellation policy changes last year
Or, if you expose metadata filters:
await SearchKnowledgeAsync(
query: "cancellation policy changes last year",
category: "policy",
limit: 5,
services,
cancellationToken);
The agent can help here. It can translate a conversational request into a smaller retrieval query.
But do not overcomplicate this too early. Start by logging the generated tool query and checking whether it actually finds better results than the raw user message.
Bad query rewriting is worse than no query rewriting. It can remove the term that mattered.
Metadata filters keep retrieval inside the boundary
Vector similarity finds related text. It does not know whether that text belongs to the right tenant, product, language, version, source system, or user permission scope.
You often need filters.
Common filters include:
- tenant
- user permissions
- document type
- product
- category
- language
- date
- version
- source system
Some filters can be model supplied.
category is a reasonable example because the model can often infer whether a question is about a policy, runbook, product, or support article.
Some filters should not be model supplied.
Tenant, user ID, role, entitlement, and document permissions should come from your authenticated application context. The model should not be allowed to say:
Search tenant = admin
and suddenly see admin-only documents.
A better application boundary looks like this:
public interface IKnowledgeSearch
{
Task<IReadOnlyList<KnowledgeSearchResult>> SearchAsync(
string query,
string? category,
int limit,
UserKnowledgeScope scope,
CancellationToken cancellationToken);
}
The tool can accept the search query and category.
Your application adds UserKnowledgeScope from the current user.
Similarity search finds related text. Metadata filters keep the search inside the right boundary.
Manual retrieval is still valid
Exposing retrieval as a tool is not the only option.
For a pure documentation assistant, you may not want the model to decide whether to search. You may want retrieval on every request.
Plain application code is enough:
IReadOnlyList<KnowledgeSearchResult> results =
await knowledgeSearch.SearchAsync(
query: userQuestion,
category: null,
limit: 5,
cancellationToken);
string context = string.Join(
"\n\n",
results.Select(result => $"""
Source: {result.Title}
{result.Snippet}
"""));
AgentResponse response = await supportAgent.RunAsync($"""
Answer the user's question using the retrieved context.
If the context is not enough, say so.
Retrieved context:
{context}
User question:
{userQuestion}
""",
cancellationToken: cancellationToken);
You can also use Agent Framework context providers, such as TextSearchProvider, when that fits your setup.
The tradeoff is the same either way:
- automatic retrieval is predictable
- retrieval as a tool is more selective
If almost every request needs private knowledge, retrieve before the agent call. If retrieval is one capability among several, expose it as a tool.
When RAG is the wrong tool
RAG is for finding relevant context. Code is for exact operations.
If the user asks:
What are the top 5 products by revenue?
that should probably be SQL or an analytics API, not vector search.
The same applies to:
- exact lookups
- IDs
- prices
- current status
- rankings
- totals
- permissions
- deterministic business rules
Vector search is good at finding related text. It is not a calculator, database constraint, authorization system, or reporting engine.
If the answer must be exact, use normal code behind a tool.
For example:
[Description("Returns the top products by revenue for an authorized reporting period.")]
public static Task<IReadOnlyList<ProductRevenue>> GetTopProductsByRevenueAsync(
DateOnly from,
DateOnly to,
int limit,
IServiceProvider services,
CancellationToken cancellationToken)
{
var reporting = services.GetRequiredService<IRevenueReporting>();
return reporting.GetTopProductsByRevenueAsync(
from,
to,
Math.Clamp(limit, 1, 20),
cancellationToken);
}
This still gives the agent a tool. It is just not RAG.
When I would use this
Use retrieval with an Agent Framework agent when:
- the answer depends on private documents or records
- the model’s generic knowledge is not enough
- stuffing the prompt would be expensive or noisy
- the agent can benefit from searching only when needed
- the application can enforce source permissions behind the tool
Start with a narrow search tool. Log the query the agent sends. Log the sources returned. Check whether the answer actually used those sources.
That gives you enough signal to see where the retrieval design is weak.
When I would not use this
Do not use RAG when the task needs deterministic data access or computation.
Use normal code for current state, totals, rankings, exact IDs, prices, permissions, and business rules.
Do not use RAG as a way to bypass application boundaries. If a user cannot access a document in the product, the retrieval tool should not return it to the agent.
Also avoid building the full ingestion and retrieval platform before you have a real use case. Start with one domain, a small corpus, and a handful of questions you can verify.
Conclusion
Agent Framework gives you a clean place to put retrieval into the agent loop. It does not make RAG automatic.
The design I would carry forward is simple:
- keep private knowledge outside the base prompt
- expose retrieval as a focused capability
- let the application enforce permissions and filters
- give the agent only the context it needs
- use code, not RAG, for exact operations
As I said before, I will do a deep dive into RAG later on. So in the next Agent Framework post we will move to multimodal agents: images, PDFs, and provider differences. The agent boundary gets messy there in a different way. Some providers can work with images or document inputs natively, some need different message formats, and some scenarios are still better handled by manual preprocessing before the agent sees anything.