A CancellationToken only helps if the expensive work actually receives it. In an ASP.NET Core API, cancellation often starts at the request boundary: the client disconnects, a timeout expires, or the caller no longer needs the result.
If the controller, endpoint, or handler accepts the token but a service method drops it, the lower-level operation may keep running anyway. That can leave a database query, HTTP call, storage request, queue operation, or model call consuming resources after the original request has already been cancelled.
In minimal APIs, ASP.NET Core can bind the request cancellation token for you. Pass it into the expensive call instead of stopping at the endpoint signature:
app.MapPost("/api/generate", async (
ChatRequest request,
IChatClient chatClient,
CancellationToken ct) =>
{
var response = await chatClient.GetResponseAsync(
request.Messages,
options: null,
cancellationToken: ct);
return Results.Ok(response.Text);
});
For AI calls, this is not just about freeing a thread. If the client disconnects during generation and the token is dropped, the model can keep producing tokens nobody will read.
Treat cancellation as part of the method contract for real I/O boundaries. If a method performs work that can take noticeable time, accept a CancellationToken and pass it to the next async operation. Do not replace it with CancellationToken.None unless you intentionally want the operation to outlive the caller.
For your own service interfaces, make the token part of the contract and use default when the caller should not be forced to pass one:
public interface IDocumentProcessor
{
Task ProcessAsync(string documentId, CancellationToken ct = default);
}
Cancellation is cooperative. It is not a magic thread abort. The code doing the work has to observe the token, either by passing it into APIs that support cancellation or by checking it at safe points in long-running logic.