Skip to main content
Version: 0.9

Unreal: Getting Started

This guide walks you through setting up the Atelico AI Engine in an Unreal Engine 5 project and building an interactive NPC dialogue system with streaming text.

What You'll Build

An actor that sends messages to an LLM and displays the response token-by-token in a UMG text widget — the foundation for any AI-driven NPC dialogue system.

By the end, you'll understand:

  1. How to install the plugin and configure it in Project Settings
  2. How to access the AI subsystem from any actor
  3. How to make a blocking chat request
  4. How to stream tokens with delegates for real-time dialogue
  5. How to maintain conversation history

Prerequisites

  • Unreal Engine 5.3 or later
  • The Atelico server bundle (atelico-server binary + atelico-asset-downloader)
  • A downloaded model:
./atelico-asset-downloader download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M

Step 1: Install the Plugin

Copy the Plugin

Copy the AtelicoAI folder into your project's Plugins/ directory:

YourProject/
├── Plugins/
│ └── AtelicoAI/ <-- copy from plugins/unreal/AtelicoAI/
│ ├── AtelicoAI.uplugin
│ └── Source/
├── Source/
└── YourProject.uproject

Add the Native Library

The plugin needs the compiled atelico_ffi native library. Copy it to the plugin's ThirdParty directory:

PlatformFileDestination
Windows (CUDA)atelico_ffi.dll + atelico_ffi.dll.libPlugins/AtelicoAI/Source/ThirdParty/lib/Win64/
macOSlibatelico_ffi.aPlugins/AtelicoAI/Source/ThirdParty/lib/Mac/
Linuxlibatelico_ffi.aPlugins/AtelicoAI/Source/ThirdParty/lib/Linux/

On Windows with CUDA, also copy the CUDA runtime DLLs (cublas64_12.dll, cublasLt64_12.dll, curand64_10.dll) next to the DLL.

Regenerate Project Files

# Windows
./GenerateProjectFiles.bat

# macOS
./GenerateProjectFiles.sh

Rebuild the project. The plugin module AtelicoAIRuntime compiles and links against the native library automatically.

Add Module Dependency

In your game module's .Build.cs, add the dependency:

PublicDependencyModuleNames.AddRange(new string[]
{
"AtelicoAIRuntime"
});

Step 2: Configure in Project Settings

Open Edit > Project Settings > Atelico AI. The key settings:

SettingRecommended ValueDescription
Default Model IDmeta-llama/Llama-3.2-3B-Instruct-Q4_K_MModel loaded on startup
Scheduling ModeBalanceHow GPU time is split between rendering and inference
VRAM Budget (MB)0 (unlimited)Set a cap if your game uses significant VRAM

The engine initializes automatically when the Game Instance is created — no manual setup code needed.

Step 3: Access the AI Subsystem

The plugin uses UE5's UGameInstanceSubsystem pattern. Access it from any actor:

#include "AtelicoAISubsystem.h"

void AMyActor::BeginPlay()
{
Super::BeginPlay();

UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();
if (!AI)
{
UE_LOG(LogTemp, Error, TEXT("Atelico AI subsystem not available"));
return;
}

// AI is ready to use
}

The subsystem:

  • Is created automatically with the Game Instance
  • Persists across level transitions
  • Ticks every frame (handles stream polling internally)
  • Is destroyed when the game exits

Step 4: Blocking Chat Request

The simplest way to get a response — blocks the game thread until complete:

void AMyActor::AskNPC()
{
UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();

FString RequestJson = TEXT(R"({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are a friendly tavern keeper named Boris. Keep responses under 2 sentences."},
{"role": "user", "content": "What's on the menu today?"}
],
"max_tokens": 100,
"temperature": 0.7
})");

FString ResponseJson = AI->ChatCompletion(RequestJson);
UE_LOG(LogTemp, Log, TEXT("Response: %s"), *ResponseJson);
}
note

Blocking calls freeze the game thread. Use them for quick tasks (embeddings, classification) or during loading screens. For dialogue during gameplay, use streaming (next step).

Step 5: Streaming with Delegates

Streaming returns tokens as they're generated, using UE5's delegate system:

// MyNPCActor.h
#pragma once

#include "CoreMinimal.h"
#include "GameFramework/Actor.h"
#include "AtelicoAISubsystem.h"
#include "MyNPCActor.generated.h"

UCLASS()
class AMyNPCActor : public AActor
{
GENERATED_BODY()

public:
virtual void BeginPlay() override;

UFUNCTION(BlueprintCallable)
void SayToNPC(const FString& PlayerMessage);

private:
UFUNCTION()
void OnTokenReceived(const FString& Token, const FString& Accumulated);

UFUNCTION()
void OnChatCompleted(const FString& FullResponse);

UFUNCTION()
void OnChatFailed(const FString& Error);

// The UI text widget — assign in Blueprint
UPROPERTY(EditAnywhere, BlueprintReadWrite, meta = (AllowPrivateAccess))
class UTextBlock* DialogueText;
};
// MyNPCActor.cpp
#include "MyNPCActor.h"
#include "Components/TextBlock.h"

void AMyNPCActor::BeginPlay()
{
Super::BeginPlay();

UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();
if (AI)
{
// Bind to the streaming delegates
AI->OnTokenReceived.AddDynamic(this, &AMyNPCActor::OnTokenReceived);
AI->OnChatCompleted.AddDynamic(this, &AMyNPCActor::OnChatCompleted);
AI->OnChatFailed.AddDynamic(this, &AMyNPCActor::OnChatFailed);
}
}

void AMyNPCActor::SayToNPC(const FString& PlayerMessage)
{
UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();
if (!AI) return;

// Clear the dialogue display
if (DialogueText)
{
DialogueText->SetText(FText::GetEmpty());
}

// Build the request
FString RequestJson = FString::Printf(TEXT(R"({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are Greta, a grumpy blacksmith. You secretly care about the player but never admit it. Keep responses under 3 sentences."},
{"role": "user", "content": "%s"}
],
"max_tokens": 150,
"temperature": 0.8
})"), *PlayerMessage.ReplaceCharWithEscapedChar());

// Start streaming — returns immediately
bool bStarted = AI->ChatCompletionStream(RequestJson);
if (!bStarted)
{
UE_LOG(LogTemp, Error, TEXT("Failed to start streaming"));
}
}

void AMyNPCActor::OnTokenReceived(const FString& Token, const FString& Accumulated)
{
// Called every frame with new token(s)
// Accumulated contains the full text so far
if (DialogueText)
{
DialogueText->SetText(FText::FromString(Accumulated));
}
}

void AMyNPCActor::OnChatCompleted(const FString& FullResponse)
{
UE_LOG(LogTemp, Log, TEXT("NPC finished speaking"));
}

void AMyNPCActor::OnChatFailed(const FString& Error)
{
UE_LOG(LogTemp, Error, TEXT("Inference error: %s"), *Error);
}

How it works internally:

  1. ChatCompletionStream() starts inference and stores the stream ID
  2. The subsystem's Tick() calls atelico_engine_on_frame() each frame
  3. It polls the stream for new tokens via atelico_stream_poll()
  4. When data arrives, it broadcasts OnTokenReceived with the new token and the accumulated text
  5. When the stream finishes, it broadcasts OnChatCompleted and cleans up

All callbacks fire on the game thread — no thread synchronization needed.

Step 6: Multi-Turn Conversation

To maintain conversation history, build up the messages array across turns:

// In your NPC actor header:
TArray<FString> ConversationMessages;

// Initialize in BeginPlay:
ConversationMessages.Add(TEXT(R"({"role": "system", "content": "You are Greta, a grumpy blacksmith."})"));

// When the player speaks:
void AMyNPCActor::SayToNPC(const FString& PlayerMessage)
{
// Add the player's message
ConversationMessages.Add(FString::Printf(
TEXT(R"({"role": "user", "content": "%s"})"),
*PlayerMessage.ReplaceCharWithEscapedChar()
));

// Build messages array
FString Messages = FString::Join(ConversationMessages, TEXT(","));
FString RequestJson = FString::Printf(TEXT(R"({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [%s],
"max_tokens": 150,
"temperature": 0.8
})"), *Messages);

// Start streaming
AI->ChatCompletionStream(RequestJson);
}

// When the response completes, store it:
void AMyNPCActor::OnChatCompleted(const FString& FullResponse)
{
// Extract the assistant's text from the accumulated response
// and add it to history for next turn
ConversationMessages.Add(FString::Printf(
TEXT(R"({"role": "assistant", "content": "%s"})"),
*FullResponse.ReplaceCharWithEscapedChar()
));
}

Step 7: GPU Scheduling (Optional)

Dynamically control how GPU time is shared between rendering and AI inference:

UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();

// During intense combat — rendering comes first
AI->SetSchedulingMode(EAtelicoSchedulingMode::PrioritizeGraphics);

// During dialogue — faster AI responses
AI->SetSchedulingMode(EAtelicoSchedulingMode::PrioritizeCompute);

// Default balanced mode
AI->SetSchedulingMode(EAtelicoSchedulingMode::Balance);

// Set limits
AI->SetVramBudgetMb(4096); // Cap GPU memory for AI
AI->SetTargetTps(15); // Limit token generation speed

Blueprint Support

All major API methods are marked BlueprintCallable. You can build AI workflows entirely in Blueprints:

  1. Get a reference to the Atelico AI subsystem via Get Game Instance > Get Subsystem
  2. Call Chat Completion or Chat Completion Stream
  3. Bind to On Token Received, On Chat Completed, and On Chat Failed events
  4. Connect the Accumulated output to a Text widget

Audio: TTS & STT

The subsystem exposes SynthesizeAudio (blocking TTS), TranscribeAudio (blocking STT), and SynthesizeAudioStream (streaming TTS via Tick-driven delegates). All three are BlueprintCallable under the Atelico AI | Audio category. Audio bytes cross the API boundary as base64-encoded WAV files.

// Blocking TTS — returns base64 WAV in the response JSON.
FString Resp = Atelico->SynthesizeAudio(TEXT(R"({
"model": "in-memory::tts",
"input": "Welcome, traveler.",
"voice": "af_heart"
})"));
// Resp: {"audio_b64":"UklGRn...","duration_seconds":..,"format":"wav","sample_rate":24000}

// Streaming TTS — chunks arrive on the OnAudioChunkReceived multicast delegate.
Atelico->OnAudioChunkReceived.AddDynamic(this, &AMyActor::HandleAudioChunk);
Atelico->OnAudioCompleted.AddDynamic(this, &AMyActor::HandleAudioCompleted);
Atelico->OnAudioFailed.AddDynamic(this, &AMyActor::HandleAudioFailed);

Atelico->SynthesizeAudioStream(TEXT(R"({
"model": "in-memory::pocket-tts",
"input": "First sentence. Second one comes right after.",
"voice": "alba"
})"));

// Blocking STT — pass a base64-encoded WAV file.
FString SttResp = Atelico->TranscribeAudio(FString::Printf(
TEXT(R"({"model":"in-memory::whisper","audio_b64":"%s"})"), *WavB64));
// SttResp: {"text":"...","language":"en","duration":...}

In Blueprint:

  1. Drag from the subsystem reference and call Synthesize Audio Stream.
  2. Bind On Audio Chunk Received to a function that base64-decodes audio and feeds it into a USoundWave (e.g. via the Runtime Audio Importer plugin) attached to an AudioComponent.
  3. Bind On Audio Completed for end-of-utterance cleanup, and On Audio Failed for error handling.

Whisper variant ids: whisper (default → whisper-base.en), whisper-tiny[.en], whisper-base[.en], whisper-small[.en], whisper-medium[.en], whisper-large-v3[-turbo], distil-large-v3. TTS ids: tts (default → kokoro-82m), kokoro, kokoro-82m, pocket, pocket-tts. See the Audio guide for voices, voice cloning, and quantization options.

Next Steps