Stop Guessing – Use Golden Datasets for Prompt Evals
Use a small golden dataset to catch prompt regressions, compare changes against a baseline, and validate model updates before users do.
Use a small golden dataset to catch prompt regressions, compare changes against a baseline, and validate model updates before users do.
Indirect prompt injection is a trust-boundary failure; treat retrieved content as untrusted data, isolate it from instructions, and validate actions before execution.
How to reduce RAG hallucinations by short-circuiting generation when retrieval returns weak evidence, with a simple C# threshold check.
Why stale documents, weak chunking, and thin metadata usually break RAG before prompt tuning does.
A repeatable local setup for timeout triage in .NET LLM workloads using Aspire, OpenTelemetry, and Ollama.
A minimal .NET starter for running local LLMs with Ollama + OllamaSharp behind IChatClient—no API keys, streaming chat, system prompts, and capped conversation history.
Why eval-first matters for LLM apps and how to use datasets, scoring rubrics, and CI quality gates to catch regressions early.