Performance work should start with evidence. Before changing code, decide what is actually slow: latency, throughput, allocations, CPU, database time, lock contention, cold start, serialization, network calls, model latency, or queue depth.
Different problems need different tools. Use traces to understand request paths, metrics to see trends, counters to inspect runtime pressure, logs to explain rare cases, profilers for CPU and allocations, and focused benchmarks for small code paths. In .NET, that usually means starting with dotnet-counters for System.Runtime counters such as GC pressure, allocation rate, CPU, lock contention, and ThreadPool queue length. Use dotnet-trace when you need a CPU trace instead of another guess.
This matters because many “obvious” optimizations move the cost instead of reducing it. Caching can increase memory pressure. Parallelism can saturate dependencies. Async wrappers can hide CPU work. A faster query can still return too much data. A smaller prompt can make a model less reliable.
Measure the baseline, make one change, then measure again with the same scenario. If the change does not move the metric you care about, remove it.
For application code, prefer production-like measurements over tiny synthetic examples. For library or hot-path code, use BenchmarkDotNet to isolate the operation and compare one change at a time:
using BenchmarkDotNet.Attributes;
[MemoryDiagnoser]
public class CsvFirstFieldBenchmark
{
private const string Data = "value1,value2,value3";
[Benchmark(Baseline = true)]
public int UsingSplit() => Data.Split(',')[0].Length;
[Benchmark]
public int UsingSpan()
{
ReadOnlySpan<char> span = Data;
return span[..span.IndexOf(',')].Length;
}
}
[MemoryDiagnoser] matters because the faster-looking version may still allocate too much. Baseline = true makes the comparison explicit. The goal is not to make every decision heavy. The goal is to stop optimizing the part that only felt suspicious in review.