This is a minimal boilerplate to run Large Language Models (LLMs) locally on your machine using C#, Microsoft.Extensions.AI, and OllamaSharp. No API keys, no credit cards, and no prompts leave your machine (when using local Ollama).
Repository: https://github.com/ovnecron/dotnet-local-llm-starter
🛠 Prerequisites
- Ollama: The engine that runs the models.
- Download Ollama here.
- Once installed, run this in your terminal to download the model:
ollama pull llama3.2:1b
- Verify Ollama is reachable:
ollama list
- If you get a connection error, start it:
ollama serve
- .NET 8 SDK or later.
🚀 Quick Start
- Clone the repo:
git clone https://github.com/ovnecron/dotnet-local-llm-starter.git
cd dotnet-local-llm-starter
- Run the app:
dotnet run --project src/LocalLLM.Console/LocalLLM.Console.csproj
Run with a different model (pull it first: ollama pull llama3.2:3b):
OLLAMA_MODEL="llama3.2:3b" dotnet run --project src/LocalLLM.Console/LocalLLM.Console.csproj
📦 Dependencies
If you cloned the repo, you can skip this - the packages are already referenced in the project file.
dotnet add src/LocalLLM.Console/LocalLLM.Console.csproj package Microsoft.Extensions.AI
dotnet add src/LocalLLM.Console/LocalLLM.Console.csproj package OllamaSharp
🧠 How it Works
1) Ollama runs the model - your app talks HTTP
When you ollama pull llama3.2:1b, the model weights are stored locally. Ollama then runs a local server (default http://localhost:11434) with an API your app can call.
Your console app does not load weights or run inference itself - it just sends chat requests to Ollama and prints the response.
That’s why the app has an endpoint setting:
using Microsoft.Extensions.AI;
using OllamaSharp;
var rawOllamaEndpoint = Environment.GetEnvironmentVariable("OLLAMA_ENDPOINT") ?? "http://localhost:11434";
var ollamaEndpoint = rawOllamaEndpoint.EndsWith('/') ? rawOllamaEndpoint : rawOllamaEndpoint + "/";
2) IChatClient is the abstraction that keeps your code clean
Microsoft.Extensions.AI defines IChatClient as a provider-agnostic interface for “chat with a model”.
In this repo we use OllamaSharp’s OllamaApiClient as the concrete implementation:
IChatClient client = new OllamaApiClient(new Uri(ollamaEndpoint), modelName);
The important bit is: everything after this line only depends on IChatClient.
So later you can swap “local Ollama” for “cloud provider” by changing only how IChatClient is created / registered (DI), while your chat loop stays the same.
3) Chat history makes it a real conversation
LLMs are stateless. If you want the model to remember context, you send the full conversation history on each call.
This app keeps a simple in-memory list:
var chatHistory = new List<ChatMessage>();
Every time you type something, it gets appended:
chatHistory.Add(new ChatMessage(ChatRole.User, input));
And after streaming completes, the assistant message is also stored:
chatHistory.Add(new ChatMessage(ChatRole.Assistant, assistant));
So the next request includes everything that happened so far.
4) Streaming response = token-by-token output
Instead of waiting for the full response, we stream updates and print them as they arrive:
var assistantText = new StringBuilder();
await foreach (var update in client.GetStreamingResponseAsync(chatHistory))
{
if (!string.IsNullOrEmpty(update.Text))
{
assistantText.Append(update.Text);
Console.Write(update.Text);
}
}
What’s happening here:
GetStreamingResponseAsync(...)yields a sequence of incremental updates.update.Textcontains new text “chunks” (often token-ish).- We print each chunk immediately for a snappy CLI feel.
- At the same time, we accumulate them in
assistantTextso we can store the final assistant reply inchatHistory.
Each loop iteration sends the full conversation history again (models are stateless), so for long sessions you should cap or summarize history.
5) Configuration via environment variables
You can change model and endpoint without touching code:
OLLAMA_MODEL(defaults tollama3.2:1b)OLLAMA_ENDPOINT(defaults tohttp://localhost:11434)OLLAMA_SYSTEM_PROMPT(optional)OLLAMA_MAX_TURNS(defaults to10)
That’s handy when you want to test different models or run Ollama on another machine.
6) Friendly failure modes
The app catches the common “model not found” case and tells you exactly what to do:
- If Ollama doesn’t have the model: run
ollama pull <model> - If requests fail: check that Ollama is running and the model exists
This keeps the starter template beginner-proof while still being minimal.
🔧 Troubleshooting
- Connection refused / request failed: start Ollama (
ollama serve) and checkOLLAMA_ENDPOINT. - Model not found:
ollama pull <model>and retry. - Slow first response: normal; model loading/warmup can take a moment.