Version: 0.9

Unreal: Getting Started

This guide walks you through setting up the Atelico AI Engine in an Unreal Engine 5 project and building an interactive NPC dialogue system with streaming text.

What You'll Build

An actor that sends messages to an LLM and displays the response token-by-token in a UMG text widget — the foundation for any AI-driven NPC dialogue system.

By the end, you'll understand:

How to install the plugin and configure it in Project Settings
How to access the AI subsystem from any actor
How to make a blocking chat request
How to stream tokens with delegates for real-time dialogue
How to maintain conversation history

Prerequisites

Unreal Engine 5.3 or later
The Atelico server bundle (atelico-server binary + atelico-asset-downloader)
A downloaded model:

./atelico-asset-downloader download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M

Step 1: Install the Plugin

Copy the Plugin

Copy the AtelicoAI folder into your project's Plugins/ directory:

YourProject/
├── Plugins/
│   └── AtelicoAI/          <-- copy from plugins/unreal/AtelicoAI/
│       ├── AtelicoAI.uplugin
│       └── Source/
├── Source/
└── YourProject.uproject

Add the Native Library

The plugin needs the compiled atelico_ffi native library. Copy it to the plugin's ThirdParty directory:

Platform	File	Destination
Windows (CUDA)	`atelico_ffi.dll` + `atelico_ffi.dll.lib`	`Plugins/AtelicoAI/Source/ThirdParty/lib/Win64/`
macOS	`libatelico_ffi.a`	`Plugins/AtelicoAI/Source/ThirdParty/lib/Mac/`
Linux	`libatelico_ffi.a`	`Plugins/AtelicoAI/Source/ThirdParty/lib/Linux/`

On Windows with CUDA, also copy the CUDA runtime DLLs (cublas64_12.dll, cublasLt64_12.dll, curand64_10.dll) next to the DLL.

Regenerate Project Files

# Windows
./GenerateProjectFiles.bat

# macOS
./GenerateProjectFiles.sh

Rebuild the project. The plugin module AtelicoAIRuntime compiles and links against the native library automatically.

Add Module Dependency

In your game module's .Build.cs, add the dependency:

PublicDependencyModuleNames.AddRange(new string[]
{
    "AtelicoAIRuntime"
});

Step 2: Configure in Project Settings

Open Edit > Project Settings > Atelico AI. The key settings:

Setting	Recommended Value	Description
Default Model ID	`meta-llama/Llama-3.2-3B-Instruct-Q4_K_M`	Model loaded on startup
Scheduling Mode	Balance	How GPU time is split between rendering and inference
VRAM Budget (MB)	0 (unlimited)	Set a cap if your game uses significant VRAM

The engine initializes automatically when the Game Instance is created — no manual setup code needed.

Step 3: Access the AI Subsystem

The plugin uses UE5's UGameInstanceSubsystem pattern. Access it from any actor:

#include "AtelicoAISubsystem.h"

void AMyActor::BeginPlay()
{
    Super::BeginPlay();

    UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();
    if (!AI)
    {
        UE_LOG(LogTemp, Error, TEXT("Atelico AI subsystem not available"));
        return;
    }

    // AI is ready to use
}

The subsystem:

Is created automatically with the Game Instance
Persists across level transitions
Ticks every frame (handles stream polling internally)
Is destroyed when the game exits

Step 4: Blocking Chat Request

The simplest way to get a response — blocks the game thread until complete:

void AMyActor::AskNPC()
{
    UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();

    FString RequestJson = TEXT(R"({
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": [
            {"role": "system", "content": "You are a friendly tavern keeper named Boris. Keep responses under 2 sentences."},
            {"role": "user", "content": "What's on the menu today?"}
        ],
        "max_tokens": 100,
        "temperature": 0.7
    })");

    FString ResponseJson = AI->ChatCompletion(RequestJson);
    UE_LOG(LogTemp, Log, TEXT("Response: %s"), *ResponseJson);
}

note

Blocking calls freeze the game thread. Use them for quick tasks (embeddings, classification) or during loading screens. For dialogue during gameplay, use streaming (next step).

Step 5: Streaming with Delegates

Streaming returns tokens as they're generated, using UE5's delegate system:

// MyNPCActor.h
#pragma once

#include "CoreMinimal.h"
#include "GameFramework/Actor.h"
#include "AtelicoAISubsystem.h"
#include "MyNPCActor.generated.h"

UCLASS()
class AMyNPCActor : public AActor
{
    GENERATED_BODY()

public:
    virtual void BeginPlay() override;

    UFUNCTION(BlueprintCallable)
    void SayToNPC(const FString& PlayerMessage);

private:
    UFUNCTION()
    void OnTokenReceived(const FString& Token, const FString& Accumulated);

    UFUNCTION()
    void OnChatCompleted(const FString& FullResponse);

    UFUNCTION()
    void OnChatFailed(const FString& Error);

    // The UI text widget — assign in Blueprint
    UPROPERTY(EditAnywhere, BlueprintReadWrite, meta = (AllowPrivateAccess))
    class UTextBlock* DialogueText;
};

// MyNPCActor.cpp
#include "MyNPCActor.h"
#include "Components/TextBlock.h"

void AMyNPCActor::BeginPlay()
{
    Super::BeginPlay();

    UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();
    if (AI)
    {
        // Bind to the streaming delegates
        AI->OnTokenReceived.AddDynamic(this, &AMyNPCActor::OnTokenReceived);
        AI->OnChatCompleted.AddDynamic(this, &AMyNPCActor::OnChatCompleted);
        AI->OnChatFailed.AddDynamic(this, &AMyNPCActor::OnChatFailed);
    }
}

void AMyNPCActor::SayToNPC(const FString& PlayerMessage)
{
    UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();
    if (!AI) return;

    // Clear the dialogue display
    if (DialogueText)
    {
        DialogueText->SetText(FText::GetEmpty());
    }

    // Build the request
    FString RequestJson = FString::Printf(TEXT(R"({
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": [
            {"role": "system", "content": "You are Greta, a grumpy blacksmith. You secretly care about the player but never admit it. Keep responses under 3 sentences."},
            {"role": "user", "content": "%s"}
        ],
        "max_tokens": 150,
        "temperature": 0.8
    })"), *PlayerMessage.ReplaceCharWithEscapedChar());

    // Start streaming — returns immediately
    bool bStarted = AI->ChatCompletionStream(RequestJson);
    if (!bStarted)
    {
        UE_LOG(LogTemp, Error, TEXT("Failed to start streaming"));
    }
}

void AMyNPCActor::OnTokenReceived(const FString& Token, const FString& Accumulated)
{
    // Called every frame with new token(s)
    // Accumulated contains the full text so far
    if (DialogueText)
    {
        DialogueText->SetText(FText::FromString(Accumulated));
    }
}

void AMyNPCActor::OnChatCompleted(const FString& FullResponse)
{
    UE_LOG(LogTemp, Log, TEXT("NPC finished speaking"));
}

void AMyNPCActor::OnChatFailed(const FString& Error)
{
    UE_LOG(LogTemp, Error, TEXT("Inference error: %s"), *Error);
}

How it works internally:

ChatCompletionStream() starts inference and stores the stream ID
The subsystem's Tick() calls atelico_engine_on_frame() each frame
It polls the stream for new tokens via atelico_stream_poll()
When data arrives, it broadcasts OnTokenReceived with the new token and the accumulated text
When the stream finishes, it broadcasts OnChatCompleted and cleans up

All callbacks fire on the game thread — no thread synchronization needed.

Step 6: Multi-Turn Conversation

To maintain conversation history, build up the messages array across turns:

// In your NPC actor header:
TArray<FString> ConversationMessages;

// Initialize in BeginPlay:
ConversationMessages.Add(TEXT(R"({"role": "system", "content": "You are Greta, a grumpy blacksmith."})"));

// When the player speaks:
void AMyNPCActor::SayToNPC(const FString& PlayerMessage)
{
    // Add the player's message
    ConversationMessages.Add(FString::Printf(
        TEXT(R"({"role": "user", "content": "%s"})"),
        *PlayerMessage.ReplaceCharWithEscapedChar()
    ));

    // Build messages array
    FString Messages = FString::Join(ConversationMessages, TEXT(","));
    FString RequestJson = FString::Printf(TEXT(R"({
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": [%s],
        "max_tokens": 150,
        "temperature": 0.8
    })"), *Messages);

    // Start streaming
    AI->ChatCompletionStream(RequestJson);
}

// When the response completes, store it:
void AMyNPCActor::OnChatCompleted(const FString& FullResponse)
{
    // Extract the assistant's text from the accumulated response
    // and add it to history for next turn
    ConversationMessages.Add(FString::Printf(
        TEXT(R"({"role": "assistant", "content": "%s"})"),
        *FullResponse.ReplaceCharWithEscapedChar()
    ));
}

Step 7: GPU Scheduling (Optional)

Dynamically control how GPU time is shared between rendering and AI inference:

UAtelicoAISubsystem* AI = GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>();

// During intense combat — rendering comes first
AI->SetSchedulingMode(EAtelicoSchedulingMode::PrioritizeGraphics);

// During dialogue — faster AI responses
AI->SetSchedulingMode(EAtelicoSchedulingMode::PrioritizeCompute);

// Default balanced mode
AI->SetSchedulingMode(EAtelicoSchedulingMode::Balance);

// Set limits
AI->SetVramBudgetMb(4096);  // Cap GPU memory for AI
AI->SetTargetTps(15);       // Limit token generation speed

Blueprint Support

All major API methods are marked BlueprintCallable. You can build AI workflows entirely in Blueprints:

Get a reference to the Atelico AI subsystem via Get Game Instance > Get Subsystem
Call Chat Completion or Chat Completion Stream
Bind to On Token Received, On Chat Completed, and On Chat Failed events
Connect the Accumulated output to a Text widget

Audio: TTS & STT

The subsystem exposes SynthesizeAudio (blocking TTS), TranscribeAudio (blocking STT), and SynthesizeAudioStream (streaming TTS via Tick-driven delegates). All three are BlueprintCallable under the Atelico AI | Audio category. Audio bytes cross the API boundary as base64-encoded WAV files.

// Blocking TTS — returns base64 WAV in the response JSON.
FString Resp = Atelico->SynthesizeAudio(TEXT(R"({
    "model": "in-memory::tts",
    "input": "Welcome, traveler.",
    "voice": "af_heart"
})"));
// Resp: {"audio_b64":"UklGRn...","duration_seconds":..,"format":"wav","sample_rate":24000}

// Streaming TTS — chunks arrive on the OnAudioChunkReceived multicast delegate.
Atelico->OnAudioChunkReceived.AddDynamic(this, &AMyActor::HandleAudioChunk);
Atelico->OnAudioCompleted.AddDynamic(this, &AMyActor::HandleAudioCompleted);
Atelico->OnAudioFailed.AddDynamic(this, &AMyActor::HandleAudioFailed);

Atelico->SynthesizeAudioStream(TEXT(R"({
    "model": "in-memory::pocket-tts",
    "input": "First sentence. Second one comes right after.",
    "voice": "alba"
})"));

// Blocking STT — pass a base64-encoded WAV file.
FString SttResp = Atelico->TranscribeAudio(FString::Printf(
    TEXT(R"({"model":"in-memory::whisper","audio_b64":"%s"})"), *WavB64));
// SttResp: {"text":"...","language":"en","duration":...}

In Blueprint:

Drag from the subsystem reference and call Synthesize Audio Stream.
Bind On Audio Chunk Received to a function that base64-decodes audio and feeds it into a USoundWave (e.g. via the Runtime Audio Importer plugin) attached to an AudioComponent.
Bind On Audio Completed for end-of-utterance cleanup, and On Audio Failed for error handling.

Whisper variant ids: whisper (default → whisper-base.en), whisper-tiny[.en], whisper-base[.en], whisper-small[.en], whisper-medium[.en], whisper-large-v3[-turbo], distil-large-v3. TTS ids: tts (default → kokoro-82m), kokoro, kokoro-82m, pocket, pocket-tts. See the Audio guide for voices, voice cloning, and quantization options.

Next Steps

Structured Generation — force JSON output with emotion tags for driving NPC animations
Audio (TTS & STT) — voices, voice cloning, quantization, env vars
Unreal API Reference — full list of all subsystem methods
Chat Completions API — detailed HTTP API reference (same JSON format)

What You'll Build​

Prerequisites​

Step 1: Install the Plugin​

Copy the Plugin​

Add the Native Library​

Regenerate Project Files​

Add Module Dependency​

Step 2: Configure in Project Settings​

Step 3: Access the AI Subsystem​

Step 4: Blocking Chat Request​

Step 5: Streaming with Delegates​

Step 6: Multi-Turn Conversation​

Step 7: GPU Scheduling (Optional)​

Blueprint Support​

Audio: TTS & STT​

Next Steps​