Skip to main content
Version: 0.9

Unity: Getting Started

This guide walks you through setting up the Atelico AI Engine in a Unity project and building an interactive NPC dialogue system with streaming text.

What You'll Build

A simple scene where the player types messages to an NPC, and the NPC responds with AI-generated dialogue that streams word-by-word into a text box — like a typewriter effect.

By the end, you'll understand:

  1. How to install the plugin and add the engine to your scene
  2. How to send a chat request and get a response
  3. How to stream tokens for real-time dialogue display
  4. How to maintain conversation history for multi-turn NPC interactions

Prerequisites

  • Unity 2022.3 or later
  • The Atelico server bundle (atelico-server binary + atelico-asset-downloader)
  • A downloaded model (see below)

Download a model before starting:

./atelico-asset-downloader download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M

Step 1: Install the Plugin

The Atelico Unity SDK is a UPM (Unity Package Manager) package.

Option A: From Disk (Development)

If you have the engine source, add this to your Packages/manifest.json:

{
"dependencies": {
"com.atelico.ai-engine": "file:../../plugins/unity/com.atelico.ai-engine"
}
}

Or use the Package Manager UI: Add (+) > Add package from disk and select plugins/unity/com.atelico.ai-engine/package.json.

Option B: From a Release Bundle

Copy the com.atelico.ai-engine folder into your project's Packages/ directory.

Copy the Native Library

The plugin needs the compiled native library for your platform:

PlatformSourceDestination in Unity Project
macOSlibatelico_ffi.dylibAssets/Plugins/macOS/
Windowsatelico_ffi.dllAssets/Plugins/x86_64/
Linuxlibatelico_ffi.soAssets/Plugins/x86_64/

On Windows with CUDA, also copy the CUDA runtime DLLs (cublas64_12.dll, cublasLt64_12.dll, curand64_10.dll) to the same folder.

Step 2: Add the Engine to Your Scene

  1. Create an empty GameObject in your scene
  2. Attach the AtelicoEngine MonoBehaviour to it
  3. The engine automatically persists across scene loads (DontDestroyOnLoad)

Only one AtelicoEngine instance should exist. Access it from anywhere via AtelicoEngine.Instance.

Step 3: Your First Chat Request (Blocking)

Create a new C# script and attach it to any GameObject:

using UnityEngine;
using Atelico;

public class SimpleChat : MonoBehaviour
{
void Start()
{
// Build an OpenAI-compatible chat request
string requestJson = @"{
""model"": ""in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M"",
""messages"": [
{""role"": ""system"", ""content"": ""You are a friendly tavern keeper named Boris. Keep responses under 2 sentences.""},
{""role"": ""user"", ""content"": ""What do you have on the menu today?""}
],
""max_tokens"": 100,
""temperature"": 0.7
}";

// Synchronous call — blocks until the response is ready
string responseJson = AtelicoEngine.Instance.Llm.ChatCompletion(requestJson);
Debug.Log(responseJson);
}
}

Press Play. The first request takes a few seconds while the model loads into GPU memory. You'll see the full JSON response in the Console.

note

The blocking ChatCompletion call freezes the game until the response is complete. This is fine for testing, but for real gameplay you'll want streaming (next step).

Step 4: Streaming for Typewriter Dialogue

Streaming returns tokens one at a time as they're generated, so you can display text progressively:

using UnityEngine;
using TMPro;
using Atelico;

public class StreamingDialogue : MonoBehaviour
{
[SerializeField] private TextMeshProUGUI dialogueText;

public void AskNPC(string playerMessage)
{
dialogueText.text = "";

string requestJson = $@"{{
""model"": ""in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M"",
""messages"": [
{{""role"": ""system"", ""content"": ""You are Boris, a grumpy tavern keeper. Keep responses under 3 sentences.""}},
{{""role"": ""user"", ""content"": ""{EscapeJson(playerMessage)}""}}
],
""max_tokens"": 150,
""temperature"": 0.8
}}";

AtelicoEngine.Instance.Llm.ChatCompletionStream(
requestJson,
onChunk: (string chunkJson) =>
{
// Each chunk is a ChatCompletionChunk JSON — extract the token
// For simplicity, append the raw delta content
string token = ExtractToken(chunkJson);
if (!string.IsNullOrEmpty(token))
{
dialogueText.text += token;
}
},
onComplete: () =>
{
Debug.Log("NPC finished speaking");
},
onError: (string error) =>
{
Debug.LogError($"Inference error: {error}");
}
);
}

// Extract delta.content from a ChatCompletionChunk JSON
private string ExtractToken(string chunkJson)
{
// Minimal extraction — in production, use JsonUtility or Newtonsoft
int idx = chunkJson.IndexOf("\"content\":\"");
if (idx < 0) return null;
idx += 11;
int end = chunkJson.IndexOf("\"", idx);
if (end < 0) return null;
return chunkJson.Substring(idx, end - idx)
.Replace("\\n", "\n")
.Replace("\\\"", "\"");
}

private string EscapeJson(string s)
{
return s.Replace("\\", "\\\\").Replace("\"", "\\\"").Replace("\n", "\\n");
}
}

The callbacks fire on the main thread — no coroutines or threading needed. The engine polls the stream automatically every Update() frame.

Step 5: Multi-Turn Conversation

To make the NPC remember what was said, maintain a conversation history:

using System.Collections.Generic;
using System.Text;
using UnityEngine;
using TMPro;
using Atelico;

public class NPCConversation : MonoBehaviour
{
[SerializeField] private TextMeshProUGUI dialogueText;
[SerializeField] private TMP_InputField inputField;

private List<string> conversationHistory = new();

void Start()
{
// System prompt defines the NPC's personality
conversationHistory.Add(
@"{""role"": ""system"", ""content"": ""You are Greta, a grumpy blacksmith. You secretly care about the player but never admit it. Keep responses under 3 sentences.""}"
);
}

public void OnPlayerSubmit()
{
string playerText = inputField.text;
if (string.IsNullOrWhiteSpace(playerText)) return;

inputField.text = "";
dialogueText.text = "";

// Add the player's message to history
conversationHistory.Add(
$@"{{""role"": ""user"", ""content"": ""{EscapeJson(playerText)}""}}"
);

// Build the request with full conversation history
string messages = string.Join(",", conversationHistory);
string requestJson = $@"{{
""model"": ""in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M"",
""messages"": [{messages}],
""max_tokens"": 150,
""temperature"": 0.8
}}";

var responseBuilder = new StringBuilder();

AtelicoEngine.Instance.Llm.ChatCompletionStream(
requestJson,
onChunk: (chunk) =>
{
string token = ExtractToken(chunk);
if (!string.IsNullOrEmpty(token))
{
responseBuilder.Append(token);
dialogueText.text += token;
}
},
onComplete: () =>
{
// Store the assistant's response for future context
conversationHistory.Add(
$@"{{""role"": ""assistant"", ""content"": ""{EscapeJson(responseBuilder.ToString())}""}}"
);
},
onError: (error) => Debug.LogError(error)
);
}

private string ExtractToken(string chunkJson)
{
int idx = chunkJson.IndexOf("\"content\":\"");
if (idx < 0) return null;
idx += 11;
int end = chunkJson.IndexOf("\"", idx);
if (end < 0) return null;
return chunkJson.Substring(idx, end - idx)
.Replace("\\n", "\n").Replace("\\\"", "\"");
}

private string EscapeJson(string s)
{
return s.Replace("\\", "\\\\").Replace("\"", "\\\"").Replace("\n", "\\n");
}
}

Now the NPC remembers the entire conversation. The player can reference earlier topics, and the NPC responds in context.

Step 6: GPU Scheduling (Optional)

If the AI engine is running in-process alongside your renderer (not as a separate server), you can control how GPU time is shared:

// During action-heavy gameplay — prioritize smooth rendering
AtelicoEngine.Instance.SetSchedulingMode(SchedulingMode.PrioritizeGraphics);

// During dialogue scenes — prioritize fast AI responses
AtelicoEngine.Instance.SetSchedulingMode(SchedulingMode.PrioritizeCompute);

// Default balanced mode
AtelicoEngine.Instance.SetSchedulingMode(SchedulingMode.Balance);

You can also set limits dynamically:

AtelicoEngine.Instance.SetVramBudgetMb(4096); // Cap VRAM usage
AtelicoEngine.Instance.SetTargetTps(15); // Limit token generation speed
AtelicoEngine.Instance.SetFrameTimeMs(16); // 60 FPS target hint

Audio: TTS & STT

The AtelicoAudio subsystem handles speech synthesis (Kokoro / Pocket TTS) and transcription (Whisper). Audio bytes cross the marshaling boundary as base64-encoded WAV files.

using System;
using UnityEngine;
using Atelico;

public class NpcVoiceController : MonoBehaviour
{
public AudioSource audioSource;

void Start()
{
// Blocking TTS — load the WAV bytes into a Unity AudioClip.
string resp = AtelicoEngine.Instance.Audio.Synthesize(@"{
""model"": ""in-memory::tts"",
""input"": ""Welcome, traveler."",
""voice"": ""af_heart""
}");
// Parse resp, base64-decode "audio_b64", pass through your WAV → AudioClip helper.

// Streaming TTS — start playback as soon as the first sentence is ready.
AtelicoEngine.Instance.Audio.SynthesizeStream(@"{
""model"": ""in-memory::pocket-tts"",
""input"": ""First sentence. Second one comes right after."",
""voice"": ""alba""
}",
onChunk: chunkJson => {
// chunkJson: {"sequence":N,"audio":"<base64 WAV>","duration_seconds":..,"text":".."}
// queue the decoded clip into audioSource
},
onComplete: () => Debug.Log("done"),
onError: err => Debug.LogError(err));
}

public void TranscribeMicClip(byte[] wavBytes)
{
string b64 = Convert.ToBase64String(wavBytes);
string resp = AtelicoEngine.Instance.Audio.Transcribe(
$"{{\"model\":\"in-memory::whisper\",\"audio_b64\":\"{b64}\"}}");
// resp: {"text":"...","language":"en","duration":...}
}
}

Whisper variant ids: whisper (default → whisper-base.en), whisper-tiny[.en], whisper-base[.en], whisper-small[.en], whisper-medium[.en], whisper-large-v3[-turbo], distil-large-v3. TTS ids: tts (default → kokoro-82m), kokoro, kokoro-82m, pocket, pocket-tts. See the Audio guide for voices, voice cloning, and quantization options.

Next Steps