Unity: Getting Started
This guide walks you through setting up the Atelico AI Engine in a Unity project and building an interactive NPC dialogue system with streaming text.
What You'll Build
A simple scene where the player types messages to an NPC, and the NPC responds with AI-generated dialogue that streams word-by-word into a text box — like a typewriter effect.
By the end, you'll understand:
- How to install the plugin and add the engine to your scene
- How to send a chat request and get a response
- How to stream tokens for real-time dialogue display
- How to maintain conversation history for multi-turn NPC interactions
Prerequisites
- Unity 2022.3 or later
- The Atelico server bundle (
atelico-serverbinary +atelico-asset-downloader) - A downloaded model (see below)
Download a model before starting:
./atelico-asset-downloader download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M
Step 1: Install the Plugin
The Atelico Unity SDK is a UPM (Unity Package Manager) package.
Option A: From Disk (Development)
If you have the engine source, add this to your Packages/manifest.json:
{
"dependencies": {
"com.atelico.ai-engine": "file:../../plugins/unity/com.atelico.ai-engine"
}
}
Or use the Package Manager UI: Add (+) > Add package from disk and select plugins/unity/com.atelico.ai-engine/package.json.
Option B: From a Release Bundle
Copy the com.atelico.ai-engine folder into your project's Packages/ directory.
Copy the Native Library
The plugin needs the compiled native library for your platform:
| Platform | Source | Destination in Unity Project |
|---|---|---|
| macOS | libatelico_ffi.dylib | Assets/Plugins/macOS/ |
| Windows | atelico_ffi.dll | Assets/Plugins/x86_64/ |
| Linux | libatelico_ffi.so | Assets/Plugins/x86_64/ |
On Windows with CUDA, also copy the CUDA runtime DLLs (cublas64_12.dll, cublasLt64_12.dll, curand64_10.dll) to the same folder.
Step 2: Add the Engine to Your Scene
- Create an empty GameObject in your scene
- Attach the
AtelicoEngineMonoBehaviour to it - The engine automatically persists across scene loads (
DontDestroyOnLoad)
Only one AtelicoEngine instance should exist. Access it from anywhere via AtelicoEngine.Instance.
Step 3: Your First Chat Request (Blocking)
Create a new C# script and attach it to any GameObject:
using UnityEngine;
using Atelico;
public class SimpleChat : MonoBehaviour
{
void Start()
{
// Build an OpenAI-compatible chat request
string requestJson = @"{
""model"": ""in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M"",
""messages"": [
{""role"": ""system"", ""content"": ""You are a friendly tavern keeper named Boris. Keep responses under 2 sentences.""},
{""role"": ""user"", ""content"": ""What do you have on the menu today?""}
],
""max_tokens"": 100,
""temperature"": 0.7
}";
// Synchronous call — blocks until the response is ready
string responseJson = AtelicoEngine.Instance.Llm.ChatCompletion(requestJson);
Debug.Log(responseJson);
}
}
Press Play. The first request takes a few seconds while the model loads into GPU memory. You'll see the full JSON response in the Console.
The blocking ChatCompletion call freezes the game until the response is complete. This is fine for testing, but for real gameplay you'll want streaming (next step).
Step 4: Streaming for Typewriter Dialogue
Streaming returns tokens one at a time as they're generated, so you can display text progressively:
using UnityEngine;
using TMPro;
using Atelico;
public class StreamingDialogue : MonoBehaviour
{
[SerializeField] private TextMeshProUGUI dialogueText;
public void AskNPC(string playerMessage)
{
dialogueText.text = "";
string requestJson = $@"{{
""model"": ""in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M"",
""messages"": [
{{""role"": ""system"", ""content"": ""You are Boris, a grumpy tavern keeper. Keep responses under 3 sentences.""}},
{{""role"": ""user"", ""content"": ""{EscapeJson(playerMessage)}""}}
],
""max_tokens"": 150,
""temperature"": 0.8
}}";
AtelicoEngine.Instance.Llm.ChatCompletionStream(
requestJson,
onChunk: (string chunkJson) =>
{
// Each chunk is a ChatCompletionChunk JSON — extract the token
// For simplicity, append the raw delta content
string token = ExtractToken(chunkJson);
if (!string.IsNullOrEmpty(token))
{
dialogueText.text += token;
}
},
onComplete: () =>
{
Debug.Log("NPC finished speaking");
},
onError: (string error) =>
{
Debug.LogError($"Inference error: {error}");
}
);
}
// Extract delta.content from a ChatCompletionChunk JSON
private string ExtractToken(string chunkJson)
{
// Minimal extraction — in production, use JsonUtility or Newtonsoft
int idx = chunkJson.IndexOf("\"content\":\"");
if (idx < 0) return null;
idx += 11;
int end = chunkJson.IndexOf("\"", idx);
if (end < 0) return null;
return chunkJson.Substring(idx, end - idx)
.Replace("\\n", "\n")
.Replace("\\\"", "\"");
}
private string EscapeJson(string s)
{
return s.Replace("\\", "\\\\").Replace("\"", "\\\"").Replace("\n", "\\n");
}
}
The callbacks fire on the main thread — no coroutines or threading needed. The engine polls the stream automatically every Update() frame.
Step 5: Multi-Turn Conversation
To make the NPC remember what was said, maintain a conversation history:
using System.Collections.Generic;
using System.Text;
using UnityEngine;
using TMPro;
using Atelico;
public class NPCConversation : MonoBehaviour
{
[SerializeField] private TextMeshProUGUI dialogueText;
[SerializeField] private TMP_InputField inputField;
private List<string> conversationHistory = new();
void Start()
{
// System prompt defines the NPC's personality
conversationHistory.Add(
@"{""role"": ""system"", ""content"": ""You are Greta, a grumpy blacksmith. You secretly care about the player but never admit it. Keep responses under 3 sentences.""}"
);
}
public void OnPlayerSubmit()
{
string playerText = inputField.text;
if (string.IsNullOrWhiteSpace(playerText)) return;
inputField.text = "";
dialogueText.text = "";
// Add the player's message to history
conversationHistory.Add(
$@"{{""role"": ""user"", ""content"": ""{EscapeJson(playerText)}""}}"
);
// Build the request with full conversation history
string messages = string.Join(",", conversationHistory);
string requestJson = $@"{{
""model"": ""in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M"",
""messages"": [{messages}],
""max_tokens"": 150,
""temperature"": 0.8
}}";
var responseBuilder = new StringBuilder();
AtelicoEngine.Instance.Llm.ChatCompletionStream(
requestJson,
onChunk: (chunk) =>
{
string token = ExtractToken(chunk);
if (!string.IsNullOrEmpty(token))
{
responseBuilder.Append(token);
dialogueText.text += token;
}
},
onComplete: () =>
{
// Store the assistant's response for future context
conversationHistory.Add(
$@"{{""role"": ""assistant"", ""content"": ""{EscapeJson(responseBuilder.ToString())}""}}"
);
},
onError: (error) => Debug.LogError(error)
);
}
private string ExtractToken(string chunkJson)
{
int idx = chunkJson.IndexOf("\"content\":\"");
if (idx < 0) return null;
idx += 11;
int end = chunkJson.IndexOf("\"", idx);
if (end < 0) return null;
return chunkJson.Substring(idx, end - idx)
.Replace("\\n", "\n").Replace("\\\"", "\"");
}
private string EscapeJson(string s)
{
return s.Replace("\\", "\\\\").Replace("\"", "\\\"").Replace("\n", "\\n");
}
}
Now the NPC remembers the entire conversation. The player can reference earlier topics, and the NPC responds in context.
Step 6: GPU Scheduling (Optional)
If the AI engine is running in-process alongside your renderer (not as a separate server), you can control how GPU time is shared:
// During action-heavy gameplay — prioritize smooth rendering
AtelicoEngine.Instance.SetSchedulingMode(SchedulingMode.PrioritizeGraphics);
// During dialogue scenes — prioritize fast AI responses
AtelicoEngine.Instance.SetSchedulingMode(SchedulingMode.PrioritizeCompute);
// Default balanced mode
AtelicoEngine.Instance.SetSchedulingMode(SchedulingMode.Balance);
You can also set limits dynamically:
AtelicoEngine.Instance.SetVramBudgetMb(4096); // Cap VRAM usage
AtelicoEngine.Instance.SetTargetTps(15); // Limit token generation speed
AtelicoEngine.Instance.SetFrameTimeMs(16); // 60 FPS target hint
Audio: TTS & STT
The AtelicoAudio subsystem handles speech synthesis (Kokoro / Pocket TTS) and transcription (Whisper). Audio bytes cross the marshaling boundary as base64-encoded WAV files.
using System;
using UnityEngine;
using Atelico;
public class NpcVoiceController : MonoBehaviour
{
public AudioSource audioSource;
void Start()
{
// Blocking TTS — load the WAV bytes into a Unity AudioClip.
string resp = AtelicoEngine.Instance.Audio.Synthesize(@"{
""model"": ""in-memory::tts"",
""input"": ""Welcome, traveler."",
""voice"": ""af_heart""
}");
// Parse resp, base64-decode "audio_b64", pass through your WAV → AudioClip helper.
// Streaming TTS — start playback as soon as the first sentence is ready.
AtelicoEngine.Instance.Audio.SynthesizeStream(@"{
""model"": ""in-memory::pocket-tts"",
""input"": ""First sentence. Second one comes right after."",
""voice"": ""alba""
}",
onChunk: chunkJson => {
// chunkJson: {"sequence":N,"audio":"<base64 WAV>","duration_seconds":..,"text":".."}
// queue the decoded clip into audioSource
},
onComplete: () => Debug.Log("done"),
onError: err => Debug.LogError(err));
}
public void TranscribeMicClip(byte[] wavBytes)
{
string b64 = Convert.ToBase64String(wavBytes);
string resp = AtelicoEngine.Instance.Audio.Transcribe(
$"{{\"model\":\"in-memory::whisper\",\"audio_b64\":\"{b64}\"}}");
// resp: {"text":"...","language":"en","duration":...}
}
}
Whisper variant ids: whisper (default → whisper-base.en), whisper-tiny[.en], whisper-base[.en], whisper-small[.en], whisper-medium[.en], whisper-large-v3[-turbo], distil-large-v3. TTS ids: tts (default → kokoro-82m), kokoro, kokoro-82m, pocket, pocket-tts. See the Audio guide for voices, voice cloning, and quantization options.
Next Steps
- Structured Generation — force the NPC to output JSON with emotion tags for driving animations
- Audio (TTS & STT) — voices, voice cloning, quantization, env vars
- Unity API Reference — full list of all methods and subsystems
- Chat Completions API — detailed HTTP API reference (same JSON format used by the SDK)