Here’s my thesis: observability isn’t dashboards - it’s time-to-diagnosis.

I don’t want to spend hours correlating logs by gut feeling. I want a local drill where timeouts are reproducible and diagnosable quickly.

In this post, an “LLM workload” is an endpoint where tail latency and failures often come from a model call (Ollama) plus prompt/tooling changes - not just your HTTP handler.

I’ve been there. I changed code in one of my projects, thinking it would not affect performance.

It did.

In the end, I had logs - but no consistent correlation across logs, traces, and metrics. It took me far too long to fix the problem.

That’s why I built a small local setup to make timeout triage repeatable, so I could experiment and find what works best for my needs.

This post is repo-first and uses the companion repository directly:


The Stack in One Minute

  • ASP.NET Core API - small request surface that I can instrument end-to-end without noise.
  • Blazor Web UI - one-click healthy, delay, timeout, and real model-call scenarios.
  • .NET Aspire AppHost - local orchestration and the Dashboard for fast pivoting.
  • Ollama (ollama/ollama:0.16.3) - real local model-call behavior without cloud token cost.
  • OpenTelemetry - logs tell me what, traces tell me where, metrics tell me how often.

The point is simple: one local environment where I can trigger failure and observe it end-to-end without guessing.


Why LLM timeouts are different

  • Prompt changes are deployments: same code, different latency/failure modes.
  • Model/runtime changes shift tail latency.
  • Tool/dependency calls amplify variance - one slow call becomes a timeout.

Minimum Correlation Fields

  • run_id to follow one request lifecycle
  • trace_id to follow execution across spans and services
  • prompt_version to tie behavior to prompt changes
  • tool_version to tie failures to integration changes

How Correlation Should Look

POST /ask -> trace_id in trace span -> run_id + trace_id in logs -> timeout metric increases

Naming convention I use:

  • snake_case in logs/JSON: run_id, trace_id, prompt_version, tool_version
  • camelCase in C# variables: runId, traceId, promptVersion, toolVersion

Example log line:

timeout during /ask run_id=9f0f2f3a6fdd4f5f9e9a1f4d8f6c6f3e trace_id=4c4f3b2e86d4d6a6b1f69a0d9d0d9f0a prompt_version=v1 tool_version=local-llm-v1

If one link in that chain is missing, triage slows down immediately.

What You’ll See

  • Click Simulated Timeout (504) in the Web UI.
  • llm_timeouts_total increments in Aspire Metrics.
  • A failing llm.run trace appears.
  • Logs contain run_id and trace_id.
  • llm_latency_ms shows a spike.

Prerequisites

  • Docker Desktop / Docker Engine installed and running.
  • .NET SDK from the repo’s global.json installed.
  • Aspire workload installed if required by your setup:
dotnet workload install aspire
  • Local ports available (or adjust launch settings): 18888, 18889, 11434.
  • If you use the stable API port appendix, you’ll also need 17100 free.

Step 1 - Clone and Run the Repository

git clone https://github.com/ovnecron/minimal-llm-observability.git
cd minimal-llm-observability
dotnet run --project LLMObservabilityLab.AppHost/LLMObservabilityLab.AppHost.csproj

Open the Aspire Dashboard URL printed in the terminal. If you see an auth prompt, use the one-time URL from the terminal.

This repo uses fixed local HTTP launch settings:

  • Aspire Dashboard: http://localhost:18888
  • OTLP endpoint (Aspire Dashboard): http://localhost:18889
  • Web UI (LLMObservabilityLab.Web): open from Aspire Dashboard resource list
  • Unsecured local transport is already enabled in the AppHost launch profile (ASPIRE_ALLOW_UNSECURED_TRANSPORT=true).

If you already run Ollama locally on 11434, stop it or change the container port mapping in AppHost.

If Real Ollama Call returns “model not found”, pull the default model in the running container:

docker exec -it "$(docker ps --filter "name=local-llm" --format "{{.Names}}" | head -n 1)" \
  ollama pull llama3.2:1b

Step 2 - Trigger Scenarios in the Web UI

Open Aspire Dashboard -> Resources -> click the web-ui endpoint.

The root page in LLMObservabilityLab.Web gives one-click actions:

  • Healthy Run
  • Simulate Delay
  • Real Ollama Call
  • Simulated Timeout (504)

Each run shows:

  • run_id
  • trace_id
  • status
  • elapsed time

The Web UI also includes /drill with the fixed 15-minute triage checklist.

Step 3 - Generate a Healthy Baseline (Optional)

Click Healthy Run ~20 times in the Web UI.

This gives you a quick baseline in llm_runs_total, llm_success_total, and llm_latency_ms before you force a timeout.

Step 4 - Force a Timeout and Triage It

Use the Web UI button Simulated Timeout (504), then move directly to Aspire Dashboard. Simulated Timeout (504) returns a controlled 504 to exercise the observability pipeline.

My triage loop (target: ~15 minutes in this lab):

  • Spot: check llm_timeouts_total in Metrics
  • Drill: open the failing llm.run trace
  • Pivot: filter logs by trace_id and run_id
  • Inspect: compare prompt_version and tool_version
  • Mitigate: smallest safe fix first
  • Verify: rerun timeout scenario and confirm recovery

Screenshot flow:

  • Metrics -> llm_latency_ms (see the spike)
  • Traces -> filter scenario=simulate_timeout
  • Open the failing llm.run

Minimal Signals I Use to Make Fast Decisions

Directly emitted by this repo:

  • llm_runs_total
  • llm_success_total
  • llm_timeouts_total
  • llm_errors_total
  • llm_latency_ms

Derived metric:

task_success_rate = llm_success_total / llm_runs_total * 100

Starter alert heuristics (seeds - tune to your baseline):

  • task_success_rate drop > 5pp in 30 minutes
  • latency percentile degradation (derived from llm_latency_ms) > 30% over baseline
  • tool_version-scoped success (derived from runs tagged with tool_version) < 90%

Troubleshooting

  • Port 11434 in use: stop local Ollama or change the AppHost port mapping.
  • No traces/metrics: verify the Aspire Dashboard is running and the OTLP endpoint is reachable.
  • Model not found: run the ollama pull ... command inside the container.
  • CLI/API calls fail: copy the exact API endpoint from Aspire Dashboard (llm-api -> Endpoints).

Verified vs Opinion (so you know what to trust)

Verified (reproducible in this repo):

  • the scenarios (healthy/delay/timeout/real call) are triggered from the Web UI
  • the correlation chain exists: metric counters -> llm.run traces -> logs with run_id/trace_id

Opinion (works well for me, tune as needed):

  • the “15-minute” target loop
  • the alert thresholds above (they’re starter seeds, not universal truth)
  • the exact four correlation fields (use more if your system needs them)

Final Thoughts

The goal isn’t perfect dashboards. It’s shrinking time to diagnosis.

If you can’t pivot from a timeout to the exact trace and log lines, you’re still guessing.

I used this lab to find a workflow that works for me, and I hope it helps you set up an observability pipeline that works for you.

If you run into an issue, open a GitHub issue and I’ll be happy to help.

Official References