Godot: Getting Started
This guide walks you through setting up the Atelico AI Engine in a Godot 4 project and building an interactive NPC dialogue system with streaming text.
What You'll Build
A scene where the player types messages to an NPC, and the NPC responds with AI-generated dialogue that streams word-by-word into a Label — like a typewriter effect.
By the end, you'll understand:
- How to install the GDExtension and add engine nodes to your scene
- How to configure backends and load a model
- How to send a blocking chat request
- How to stream tokens with signals for real-time dialogue
- How to maintain conversation history
Prerequisites
- Godot 4.2 or later
- The Atelico server bundle (
atelico-asset-downloader) - A downloaded model:
./atelico-asset-downloader download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M
Step 1: Install the GDExtension
Copy the Files
Create the following structure in your Godot project:
your_project/
├── addons/
│ └── atelico/
│ ├── atelico_ai_engine.gdextension
│ └── bin/
│ ├── libatelico_ai_engine.dylib (macOS)
│ ├── libatelico_ai_engine.so (Linux)
│ └── atelico_ai_engine.dll (Windows)
├── project.godot
└── ...
Create the .gdextension File
Create addons/atelico/atelico_ai_engine.gdextension with this content:
[configuration]
entry_symbol = "gdext_rust_init"
compatibility_minimum = 4.2
[libraries]
macos.release = "res://addons/atelico/bin/libatelico_ai_engine.dylib"
linux.release.x86_64 = "res://addons/atelico/bin/libatelico_ai_engine.so"
windows.release.x86_64 = "res://addons/atelico/bin/atelico_ai_engine.dll"
Enable the Plugin
Open Project > Project Settings > Plugins and enable "Atelico AI Engine".
Step 2: Set Up Your Scene
Add the engine node to your scene tree:
MainScene (Node)
├── AtelicoEngineNode
├── DialogueLabel (Label)
└── InputField (LineEdit)
In the editor, add an AtelicoEngineNode as a child node — it appears in the "Add Node" dialog after the plugin is enabled.
Step 3: Initialize the Engine
Create a script on your main scene node:
extends Node
@onready var engine = $AtelicoEngineNode
@onready var label = $DialogueLabel
func _ready():
# Configure GPU scheduling (optional)
var singleton = Engine.get_singleton("AtelicoSingleton")
singleton.set_gpu_scheduling_mode(1) # 0=compute, 1=balance, 2=graphics
singleton.set_vram_budget_mb(4096)
# Define backends — in-memory for local on-device inference
var backends = [
{
"name": "in-memory",
"type": "in-memory",
"config": {}
}
]
engine.initialize_engine(backends)
# Pre-load the model (blocking — do this during a loading screen)
engine.model_load("in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M")
print("Engine ready!")
The first model_load call downloads the model to the local cache if needed. Subsequent launches load from cache.
Step 4: Blocking Chat Request
The simplest way to get a response:
func ask_npc_sync(player_message: String) -> String:
var request = JSON.stringify({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are Boris, a friendly tavern keeper. Keep responses under 2 sentences."},
{"role": "user", "content": player_message}
],
"max_tokens": 100,
"temperature": 0.7
})
var response_json = engine.llm_chat(request)
var response = JSON.parse_string(response_json)
return response["choices"][0]["message"]["content"]
Blocking calls freeze the game until the response is complete. Use them during loading screens or for very short responses. For gameplay dialogue, use streaming (next step).
Step 5: Streaming with Signals
Streaming delivers tokens one at a time via Godot signals. This is the recommended approach for dialogue during gameplay:
extends Node
@onready var engine = $AtelicoEngineNode
@onready var label = $DialogueLabel
var stream_text := ""
func _ready():
# ... initialization from Step 3 ...
# Connect to streaming signals
engine.inference_token_generated.connect(_on_token)
engine.inference_completed.connect(_on_stream_done)
func ask_npc(player_message: String) -> void:
# Clear the display
stream_text = ""
label.text = ""
var request = JSON.stringify({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are Greta, a grumpy blacksmith. You secretly care about the player but never admit it. Keep responses under 3 sentences."},
{"role": "user", "content": player_message}
],
"max_tokens": 150,
"temperature": 0.8
})
# Start streaming — returns immediately with a job_id
var job_id = engine.llm_chat_stream(request)
print("Streaming started, job_id: ", job_id)
func _on_token(job_id: int, chunk_json: String) -> void:
# Called once per frame with new token data
var chunk = JSON.parse_string(chunk_json)
var delta = chunk["choices"][0]["delta"]
if delta.has("content") and delta["content"] != null:
stream_text += delta["content"]
label.text = stream_text
func _on_stream_done(job_id: int, success: bool) -> void:
if success:
print("NPC finished speaking: ", stream_text)
else:
print("Streaming failed")
How it works internally:
llm_chat_stream()queues the request and returns immediately- The engine's
_process()dispatches the request to the backend on the next frame - Each frame,
_process()polls for new tokens and emitsinference_token_generated - When the stream ends,
inference_completedfires
All signals fire on the main thread — no threading code needed.
Step 6: Multi-Turn Conversation
Maintain a conversation history so the NPC remembers what was said:
extends Node
@onready var engine = $AtelicoEngineNode
@onready var label = $DialogueLabel
@onready var input_field = $InputField
var conversation: Array = []
var stream_text := ""
func _ready():
# ... initialization ...
engine.inference_token_generated.connect(_on_token)
engine.inference_completed.connect(_on_stream_done)
# Set the NPC's personality
conversation.append({
"role": "system",
"content": "You are Greta, a grumpy blacksmith in a medieval village. You secretly care about the player but would never admit it. Keep responses under 3 sentences."
})
func _on_input_submitted(text: String) -> void:
if text.strip_edges().is_empty():
return
input_field.text = ""
# Add the player's message to history
conversation.append({"role": "user", "content": text})
# Clear display and start streaming
stream_text = ""
label.text = ""
var request = JSON.stringify({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": conversation,
"max_tokens": 150,
"temperature": 0.8
})
engine.llm_chat_stream(request)
func _on_token(job_id: int, chunk_json: String) -> void:
var chunk = JSON.parse_string(chunk_json)
var delta = chunk["choices"][0]["delta"]
if delta.has("content") and delta["content"] != null:
stream_text += delta["content"]
label.text = stream_text
func _on_stream_done(job_id: int, success: bool) -> void:
if success:
# Store the NPC's response for future context
conversation.append({"role": "assistant", "content": stream_text})
Now the NPC remembers the entire conversation across turns.
Step 7: GPU Scheduling (Optional)
Control how GPU time is shared between rendering and AI inference:
var singleton = Engine.get_singleton("AtelicoSingleton")
# During action gameplay — prioritize smooth rendering
singleton.set_gpu_scheduling_mode(2) # PRIORITIZE_GRAPHICS
# During dialogue scenes — prioritize fast AI responses
singleton.set_gpu_scheduling_mode(0) # PRIORITIZE_COMPUTE
# Default balanced mode
singleton.set_gpu_scheduling_mode(1) # BALANCE
# Dynamic limits
singleton.set_vram_budget_mb(4096)
singleton.set_target_tokens_per_second(15)
singleton.set_frame_time_ms(16) # 60 FPS target
Async (Non-Blocking, Non-Streaming)
If you want a complete response without streaming but also without freezing the game thread, use the async variant:
func _ready():
engine.async_request_completed.connect(_on_async_done)
func ask_npc_async(player_message: String) -> void:
var request = JSON.stringify({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [{"role": "user", "content": player_message}],
"max_tokens": 100
})
var job_id = engine.llm_chat_async(request)
func _on_async_done(job_id: int, response_json: String) -> void:
var response = JSON.parse_string(response_json)
label.text = response["choices"][0]["message"]["content"]
This returns the full response at once (no token-by-token streaming) but doesn't block the game thread.
Audio: TTS & STT
The Godot node exposes three audio methods plus matching signals. Audio bytes cross the API boundary as base64-encoded WAV files; pair them with Marshalls.base64_to_raw and AudioStreamWAV to feed an AudioStreamPlayer.
# Blocking TTS
var resp_json: String = engine.audio_synthesize(JSON.stringify({
"model": "in-memory::tts",
"input": "Welcome, traveler.",
"voice": "af_heart",
}))
var resp: Dictionary = JSON.parse_string(resp_json)
var wav: PackedByteArray = Marshalls.base64_to_raw(resp["audio_b64"])
# Convert wav → AudioStreamWAV and play through an AudioStreamPlayer.
# Streaming TTS — chunks arrive on the audio_synthesis_chunk signal,
# completion on audio_synthesis_completed.
engine.audio_synthesis_chunk.connect(_on_audio_chunk)
engine.audio_synthesis_completed.connect(_on_audio_done)
var job_id: int = engine.audio_synthesize_stream(JSON.stringify({
"model": "in-memory::pocket-tts",
"input": "First sentence. Second one comes right after.",
"voice": "alba",
}))
func _on_audio_chunk(job_id: int, chunk_json: String) -> void:
var chunk: Dictionary = JSON.parse_string(chunk_json)
var bytes: PackedByteArray = Marshalls.base64_to_raw(chunk["audio"])
# queue bytes into your AudioStreamPlayer; chunk["text"] is the source sentence
func _on_audio_done(job_id: int, success: bool) -> void:
print("synth done: ", success)
# Blocking STT — pass a WAV file as base64
var wav_bytes: PackedByteArray = FileAccess.get_file_as_bytes("user://speech.wav")
var stt_json: String = engine.audio_transcribe(JSON.stringify({
"model": "in-memory::whisper",
"audio_b64": Marshalls.raw_to_base64(wav_bytes),
}))
print(JSON.parse_string(stt_json)["text"])
Whisper variant ids: whisper (default → whisper-base.en), whisper-tiny[.en], whisper-base[.en], whisper-small[.en], whisper-medium[.en], whisper-large-v3[-turbo], distil-large-v3. TTS ids: tts (default → kokoro-82m), kokoro, kokoro-82m, pocket, pocket-tts. See the Audio guide for voices, voice cloning, and quantization options.
Next Steps
- Structured Generation — force JSON output with emotion tags for driving NPC animations
- Audio (TTS & STT) — voices, voice cloning, quantization, env vars
- Godot API Reference — full list of all nodes, methods, and signals
- Chat Completions API — detailed API reference (same JSON format)