Local LLMs in .NET

This is a minimal boilerplate to run Large Language Models (LLMs) locally on your machine using C#, Microsoft.Extensions.AI, and OllamaSharp. No API keys, no credit cards, and no prompts leave your machine (when using local Ollama).

Repository: https://github.com/ovnecron/dotnet-local-llm-starter

🛠 Prerequisites

Ollama: The engine that runs the models.

Download Ollama here.
Once installed, run this in your terminal to download the model:

ollama pull llama3.2:1b

Verify Ollama is reachable:

ollama list

If you get a connection error, start it:

ollama serve

.NET 8 SDK or later.

🚀 Quick Start

Clone the repo:

git clone https://github.com/ovnecron/dotnet-local-llm-starter.git
cd dotnet-local-llm-starter

Run the app:

dotnet run --project src/LocalLLM.Console/LocalLLM.Console.csproj

Run with a different model (pull it first: ollama pull llama3.2:3b):

OLLAMA_MODEL="llama3.2:3b" dotnet run --project src/LocalLLM.Console/LocalLLM.Console.csproj

📦 Dependencies

If you cloned the repo, you can skip this - the packages are already referenced in the project file.

dotnet add src/LocalLLM.Console/LocalLLM.Console.csproj package Microsoft.Extensions.AI
dotnet add src/LocalLLM.Console/LocalLLM.Console.csproj package OllamaSharp

🧠 How it Works

1) Ollama runs the model - your app talks HTTP

When you ollama pull llama3.2:1b, the model weights are stored locally. Ollama then runs a local server (default http://localhost:11434) with an API your app can call.

Your console app does not load weights or run inference itself - it just sends chat requests to Ollama and prints the response.

That’s why the app has an endpoint setting:

using Microsoft.Extensions.AI;
using OllamaSharp;

var rawOllamaEndpoint = Environment.GetEnvironmentVariable("OLLAMA_ENDPOINT") ?? "http://localhost:11434";

var ollamaEndpoint = rawOllamaEndpoint.EndsWith('/') ? rawOllamaEndpoint : rawOllamaEndpoint + "/";

2) `IChatClient` is the abstraction that keeps your code clean

Microsoft.Extensions.AI defines IChatClient as a provider-agnostic interface for “chat with a model”.

In this repo we use OllamaSharp’s OllamaApiClient as the concrete implementation:

IChatClient client = new OllamaApiClient(new Uri(ollamaEndpoint), modelName);

The important bit is: everything after this line only depends on IChatClient.

So later you can swap “local Ollama” for “cloud provider” by changing only how IChatClient is created / registered (DI), while your chat loop stays the same.

3) Chat history makes it a real conversation

LLMs are stateless. If you want the model to remember context, you send the full conversation history on each call.

This app keeps a simple in-memory list:

var chatHistory = new List<ChatMessage>();

Every time you type something, it gets appended:

chatHistory.Add(new ChatMessage(ChatRole.User, input));

And after streaming completes, the assistant message is also stored:

chatHistory.Add(new ChatMessage(ChatRole.Assistant, assistant));

So the next request includes everything that happened so far.

4) Streaming response = token-by-token output

Instead of waiting for the full response, we stream updates and print them as they arrive:

var assistantText = new StringBuilder();

await foreach (var update in client.GetStreamingResponseAsync(chatHistory))
{
	if (!string.IsNullOrEmpty(update.Text))
	{
		assistantText.Append(update.Text);
		Console.Write(update.Text);
	}
}

What’s happening here:

GetStreamingResponseAsync(...) yields a sequence of incremental updates.
update.Text contains new text “chunks” (often token-ish).
We print each chunk immediately for a snappy CLI feel.
At the same time, we accumulate them in assistantText so we can store the final assistant reply in chatHistory.

Each loop iteration sends the full conversation history again (models are stateless), so for long sessions you should cap or summarize history.

5) Configuration via environment variables

You can change model and endpoint without touching code:

OLLAMA_MODEL (defaults to llama3.2:1b)
OLLAMA_ENDPOINT (defaults to http://localhost:11434)
OLLAMA_SYSTEM_PROMPT (optional)
OLLAMA_MAX_TURNS (defaults to 10)

That’s handy when you want to test different models or run Ollama on another machine.

6) Friendly failure modes

The app catches the common “model not found” case and tells you exactly what to do:

If Ollama doesn’t have the model: run ollama pull <model>
If requests fail: check that Ollama is running and the model exists

This keeps the starter template beginner-proof while still being minimal.

🔧 Troubleshooting

Connection refused / request failed: start Ollama (ollama serve) and check OLLAMA_ENDPOINT.
Model not found: ollama pull <model> and retry.
Slow first response: normal; model loading/warmup can take a moment.

🛠 Prerequisites#

🚀 Quick Start#

📦 Dependencies#

🧠 How it Works#

1) Ollama runs the model - your app talks HTTP#

2) IChatClient is the abstraction that keeps your code clean#

3) Chat history makes it a real conversation#

4) Streaming response = token-by-token output#

5) Configuration via environment variables#

6) Friendly failure modes#

🔧 Troubleshooting#

Newsletter