Unreal: Getting Started
This guide walks you through setting up the Atelico AI Engine in an Unreal Engine 5 project and building an interactive NPC dialogue system with streaming text.
What You'll Build
An actor that sends messages to an LLM and displays the response token-by-token in a UMG text widget — the foundation for any AI-driven NPC dialogue system.
By the end, you'll understand:
- How to install the plugin and configure it in Project Settings
- How to access the AI subsystem from any actor
- How to make a blocking chat request
- How to stream tokens with delegates for real-time dialogue
- How to maintain conversation history
Prerequisites
- Unreal Engine 5.3 or later
- The Atelico server bundle (
atelico-serverbinary +atelico-asset-downloader) - A downloaded model:
./atelico-asset-downloader download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M
Step 1: Install the Plugin
Copy the Plugin
Copy the AtelicoAI folder into your project's Plugins/ directory:
YourProject/
├── Plugins/
│ └── AtelicoAI/ <-- copy from plugins/unreal/AtelicoAI/
│ ├── AtelicoAI.uplugin
│ └── Source/
├── Source/
└── YourProject.uproject
Add the Native Library
The plugin needs the compiled atelico_ffi native library. Copy it to the plugin's ThirdParty directory:
| Platform | File | Destination |
|---|---|---|
| Windows (CUDA) | atelico_ffi.dll + atelico_ffi.dll.lib | Plugins/AtelicoAI/Source/ThirdParty/lib/Win64/ |
| macOS | libatelico_ffi.a | Plugins/AtelicoAI/Source/ThirdParty/lib/Mac/ |
| Linux | libatelico_ffi.a | Plugins/AtelicoAI/Source/ThirdParty/lib/Linux/ |
On Windows with CUDA, also copy the CUDA runtime DLLs (cublas64_12.dll, cublasLt64_12.dll, curand64_10.dll) next to the DLL.
Regenerate Project Files
# Windows
./GenerateProjectFiles.bat
# macOS
./GenerateProjectFiles.sh
Rebuild the project. The plugin module AtelicoAIRuntime compiles and links against the native library automatically.
Add Module Dependency
In your game module's .Build.cs, add the dependency:
PublicDependencyModuleNames.AddRange(new string[]
{
"AtelicoAIRuntime"
});
Step 2: Configure in Project Settings
Open Edit > Project Settings > Atelico AI. The key settings:
| Setting | Recommended Value | Description |
|---|---|---|
| Default Model ID | meta-llama/Llama-3.2-3B-Instruct-Q4_K_M | Model loaded on startup |
| Scheduling Mode | Balance | How GPU time is split between rendering and inference |
| VRAM Budget (MB) | 0 (unlimited) | Set a cap if your game uses significant VRAM |
The engine initializes automatically when the Game Instance is created — no manual setup code needed.
Step 3: Access the AI Subsystem
The plugin uses UE5's UGameInstanceSubsystem pattern. Access it from any actor:
#include "AtelicoAISubsystem.h"
void AMyActor::BeginPlay()
{
Super::BeginPlay();
UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();
if (!AI)
{
UE_LOG(LogTemp, Error, TEXT("Atelico AI subsystem not available"));
return;
}
// AI is ready to use
}
The subsystem:
- Is created automatically with the Game Instance
- Persists across level transitions
- Ticks every frame (handles stream polling internally)
- Is destroyed when the game exits
Step 4: Blocking Chat Request
The simplest way to get a response — blocks the game thread until complete:
void AMyActor::AskNPC()
{
UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();
FString RequestJson = TEXT(R"({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are a friendly tavern keeper named Boris. Keep responses under 2 sentences."},
{"role": "user", "content": "What's on the menu today?"}
],
"max_tokens": 100,
"temperature": 0.7
})");
FString ResponseJson = AI->ChatCompletion(RequestJson);
UE_LOG(LogTemp, Log, TEXT("Response: %s"), *ResponseJson);
}
Blocking calls freeze the game thread. Use them for quick tasks (embeddings, classification) or during loading screens. For dialogue during gameplay, use streaming (next step).
Step 5: Streaming with Delegates
Streaming returns tokens as they're generated, using UE5's delegate system:
// MyNPCActor.h
#pragma once
#include "CoreMinimal.h"
#include "GameFramework/Actor.h"
#include "AtelicoAISubsystem.h"
#include "MyNPCActor.generated.h"
UCLASS()
class AMyNPCActor : public AActor
{
GENERATED_BODY()
public:
virtual void BeginPlay() override;
UFUNCTION(BlueprintCallable)
void SayToNPC(const FString& PlayerMessage);
private:
UFUNCTION()
void OnTokenReceived(const FString& Token, const FString& Accumulated);
UFUNCTION()
void OnChatCompleted(const FString& FullResponse);
UFUNCTION()
void OnChatFailed(const FString& Error);
// The UI text widget — assign in Blueprint
UPROPERTY(EditAnywhere, BlueprintReadWrite, meta = (AllowPrivateAccess))
class UTextBlock* DialogueText;
};
// MyNPCActor.cpp
#include "MyNPCActor.h"
#include "Components/TextBlock.h"
void AMyNPCActor::BeginPlay()
{
Super::BeginPlay();
UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();
if (AI)
{
// Bind to the streaming delegates
AI->OnTokenReceived.AddDynamic(this, &AMyNPCActor::OnTokenReceived);
AI->OnChatCompleted.AddDynamic(this, &AMyNPCActor::OnChatCompleted);
AI->OnChatFailed.AddDynamic(this, &AMyNPCActor::OnChatFailed);
}
}
void AMyNPCActor::SayToNPC(const FString& PlayerMessage)
{
UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();
if (!AI) return;
// Clear the dialogue display
if (DialogueText)
{
DialogueText->SetText(FText::GetEmpty());
}
// Build the request
FString RequestJson = FString::Printf(TEXT(R"({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are Greta, a grumpy blacksmith. You secretly care about the player but never admit it. Keep responses under 3 sentences."},
{"role": "user", "content": "%s"}
],
"max_tokens": 150,
"temperature": 0.8
})"), *PlayerMessage.ReplaceCharWithEscapedChar());
// Start streaming — returns immediately
bool bStarted = AI->ChatCompletionStream(RequestJson);
if (!bStarted)
{
UE_LOG(LogTemp, Error, TEXT("Failed to start streaming"));
}
}
void AMyNPCActor::OnTokenReceived(const FString& Token, const FString& Accumulated)
{
// Called every frame with new token(s)
// Accumulated contains the full text so far
if (DialogueText)
{
DialogueText->SetText(FText::FromString(Accumulated));
}
}
void AMyNPCActor::OnChatCompleted(const FString& FullResponse)
{
UE_LOG(LogTemp, Log, TEXT("NPC finished speaking"));
}
void AMyNPCActor::OnChatFailed(const FString& Error)
{
UE_LOG(LogTemp, Error, TEXT("Inference error: %s"), *Error);
}
How it works internally:
ChatCompletionStream()starts inference and stores the stream ID- The subsystem's
Tick()callsatelico_engine_on_frame()each frame - It polls the stream for new tokens via
atelico_stream_poll() - When data arrives, it broadcasts
OnTokenReceivedwith the new token and the accumulated text - When the stream finishes, it broadcasts
OnChatCompletedand cleans up
All callbacks fire on the game thread — no thread synchronization needed.
Step 6: Multi-Turn Conversation
To maintain conversation history, build up the messages array across turns:
// In your NPC actor header:
TArray<FString> ConversationMessages;
// Initialize in BeginPlay:
ConversationMessages.Add(TEXT(R"({"role": "system", "content": "You are Greta, a grumpy blacksmith."})"));
// When the player speaks:
void AMyNPCActor::SayToNPC(const FString& PlayerMessage)
{
// Add the player's message
ConversationMessages.Add(FString::Printf(
TEXT(R"({"role": "user", "content": "%s"})"),
*PlayerMessage.ReplaceCharWithEscapedChar()
));
// Build messages array
FString Messages = FString::Join(ConversationMessages, TEXT(","));
FString RequestJson = FString::Printf(TEXT(R"({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [%s],
"max_tokens": 150,
"temperature": 0.8
})"), *Messages);
// Start streaming
AI->ChatCompletionStream(RequestJson);
}
// When the response completes, store it:
void AMyNPCActor::OnChatCompleted(const FString& FullResponse)
{
// Extract the assistant's text from the accumulated response
// and add it to history for next turn
ConversationMessages.Add(FString::Printf(
TEXT(R"({"role": "assistant", "content": "%s"})"),
*FullResponse.ReplaceCharWithEscapedChar()
));
}
Step 7: GPU Scheduling (Optional)
Dynamically control how GPU time is shared between rendering and AI inference:
UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();
// During intense combat — rendering comes first
AI->SetSchedulingMode(EAtelicoSchedulingMode::PrioritizeGraphics);
// During dialogue — faster AI responses
AI->SetSchedulingMode(EAtelicoSchedulingMode::PrioritizeCompute);
// Default balanced mode
AI->SetSchedulingMode(EAtelicoSchedulingMode::Balance);
// Set limits
AI->SetVramBudgetMb(4096); // Cap GPU memory for AI
AI->SetTargetTps(15); // Limit token generation speed
Blueprint Support
All major API methods are marked BlueprintCallable. You can build AI workflows entirely in Blueprints:
- Get a reference to the Atelico AI subsystem via
Get Game Instance > Get Subsystem - Call
Chat CompletionorChat Completion Stream - Bind to
On Token Received,On Chat Completed, andOn Chat Failedevents - Connect the
Accumulatedoutput to a Text widget
Audio: TTS & STT
The subsystem exposes SynthesizeAudio (blocking TTS), TranscribeAudio (blocking STT), and SynthesizeAudioStream (streaming TTS via Tick-driven delegates). All three are BlueprintCallable under the Atelico AI | Audio category. Audio bytes cross the API boundary as base64-encoded WAV files.
// Blocking TTS — returns base64 WAV in the response JSON.
FString Resp = Atelico->SynthesizeAudio(TEXT(R"({
"model": "in-memory::tts",
"input": "Welcome, traveler.",
"voice": "af_heart"
})"));
// Resp: {"audio_b64":"UklGRn...","duration_seconds":..,"format":"wav","sample_rate":24000}
// Streaming TTS — chunks arrive on the OnAudioChunkReceived multicast delegate.
Atelico->OnAudioChunkReceived.AddDynamic(this, &AMyActor::HandleAudioChunk);
Atelico->OnAudioCompleted.AddDynamic(this, &AMyActor::HandleAudioCompleted);
Atelico->OnAudioFailed.AddDynamic(this, &AMyActor::HandleAudioFailed);
Atelico->SynthesizeAudioStream(TEXT(R"({
"model": "in-memory::pocket-tts",
"input": "First sentence. Second one comes right after.",
"voice": "alba"
})"));
// Blocking STT — pass a base64-encoded WAV file.
FString SttResp = Atelico->TranscribeAudio(FString::Printf(
TEXT(R"({"model":"in-memory::whisper","audio_b64":"%s"})"), *WavB64));
// SttResp: {"text":"...","language":"en","duration":...}
In Blueprint:
- Drag from the subsystem reference and call Synthesize Audio Stream.
- Bind On Audio Chunk Received to a function that base64-decodes
audioand feeds it into aUSoundWave(e.g. via the Runtime Audio Importer plugin) attached to anAudioComponent. - Bind On Audio Completed for end-of-utterance cleanup, and On Audio Failed for error handling.
Whisper variant ids: whisper (default → whisper-base.en), whisper-tiny[.en], whisper-base[.en], whisper-small[.en], whisper-medium[.en], whisper-large-v3[-turbo], distil-large-v3. TTS ids: tts (default → kokoro-82m), kokoro, kokoro-82m, pocket, pocket-tts. See the Audio guide for voices, voice cloning, and quantization options.
Next Steps
- Structured Generation — force JSON output with emotion tags for driving NPC animations
- Audio (TTS & STT) — voices, voice cloning, quantization, env vars
- Unreal API Reference — full list of all subsystem methods
- Chat Completions API — detailed HTTP API reference (same JSON format)