Keep vector search filters separate from semantic ranking

In retrieval systems, filtering and ranking answer different questions. Filtering decides which documents are eligible. Ranking decides which eligible documents are most relevant.

Keep those concerns separate. Tenant ID, permissions, language, product area, document type, retention state, and security labels are usually filters. Semantic similarity, keyword score, freshness boosts, and reranking are relevance signals.

This matters because mixing them leads to fragile behavior. A document should not rank highly enough to bypass authorization. A low-similarity document should not be retrieved just because it matches a broad metadata tag. The retrieval pipeline needs both eligibility and relevance.

For RAG systems, apply hard filters before or around vector search when the constraint is mandatory. Then use semantic or hybrid ranking to order the allowed candidates. That keeps retrieval safer, easier to debug, and easier to tune.

In Azure AI Search, this boundary is visible in the query shape. The OData filter defines eligibility. Vector search and semantic ranking define relevance:

string tenant = currentTenantId.Replace("'", "''");

var options = new SearchOptions
{
    Filter = $"tenantId eq '{tenant}' and status eq 'published'",
    QueryType = SearchQueryType.Semantic,
    SemanticSearch = new SemanticSearchOptions
    {
        SemanticConfigurationName = "default"
    },
    VectorSearch = new VectorSearchOptions
    {
        Queries =
        {
            new VectorizedQuery(queryVector)
            {
                KNearestNeighborsCount = 50,
                Fields = { "contentVector" }
            }
        }
    }
};

SearchResults<SearchDocument> results =
    await searchClient.SearchAsync<SearchDocument>(searchText, options);

The tenant and publication state stay in the filter. The vector query and semantic ranker can only rank documents that are eligible to be returned.

For newer Azure AI Search vector indexes, prefiltering is the default and recommended mode. That means the filter is applied during vector search traversal, which favors recall for selective filters. Post-filtering can be faster in some cases, but it can also miss matching documents when k is small or the filter is highly selective.