Skip to main content
Version: 0.8

Godot: Getting Started

This guide walks you through setting up the Atelico AI Engine in a Godot 4 project and building an interactive NPC dialogue system with streaming text.

What You'll Build

A scene where the player types messages to an NPC, and the NPC responds with AI-generated dialogue that streams word-by-word into a Label — like a typewriter effect.

By the end, you'll understand:

  1. How to install the GDExtension and add engine nodes to your scene
  2. How to configure backends and load a model
  3. How to send a blocking chat request
  4. How to stream tokens with signals for real-time dialogue
  5. How to maintain conversation history

Prerequisites

  • Godot 4.2 or later
  • The Atelico server bundle (atelico-asset-downloader)
  • A downloaded model:
./atelico-asset-downloader download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M

Step 1: Install the GDExtension

Copy the Files

Create the following structure in your Godot project:

your_project/
├── addons/
│ └── atelico/
│ ├── atelico_ai_engine.gdextension
│ └── bin/
│ ├── libatelico_ai_engine.dylib (macOS)
│ ├── libatelico_ai_engine.so (Linux)
│ └── atelico_ai_engine.dll (Windows)
├── project.godot
└── ...

Create the .gdextension File

Create addons/atelico/atelico_ai_engine.gdextension with this content:

[configuration]
entry_symbol = "gdext_rust_init"
compatibility_minimum = 4.2

[libraries]
macos.release = "res://addons/atelico/bin/libatelico_ai_engine.dylib"
linux.release.x86_64 = "res://addons/atelico/bin/libatelico_ai_engine.so"
windows.release.x86_64 = "res://addons/atelico/bin/atelico_ai_engine.dll"

Enable the Plugin

Open Project > Project Settings > Plugins and enable "Atelico AI Engine".

Step 2: Set Up Your Scene

Add the engine node to your scene tree:

MainScene (Node)
├── AtelicoEngineNode
├── DialogueLabel (Label)
└── InputField (LineEdit)

In the editor, add an AtelicoEngineNode as a child node — it appears in the "Add Node" dialog after the plugin is enabled.

Step 3: Initialize the Engine

Create a script on your main scene node:

extends Node

@onready var engine = $AtelicoEngineNode
@onready var label = $DialogueLabel

func _ready():
# Configure GPU scheduling (optional)
var singleton = Engine.get_singleton("AtelicoSingleton")
singleton.set_gpu_scheduling_mode(1) # 0=compute, 1=balance, 2=graphics
singleton.set_vram_budget_mb(4096)

# Define backends — in-memory for local on-device inference
var backends = [
{
"name": "in-memory",
"type": "in-memory",
"config": {}
}
]
engine.initialize_engine(backends)

# Pre-load the model (blocking — do this during a loading screen)
engine.model_load("in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M")
print("Engine ready!")

The first model_load call downloads the model to the local cache if needed. Subsequent launches load from cache.

Step 4: Blocking Chat Request

The simplest way to get a response:

func ask_npc_sync(player_message: String) -> String:
var request = JSON.stringify({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are Boris, a friendly tavern keeper. Keep responses under 2 sentences."},
{"role": "user", "content": player_message}
],
"max_tokens": 100,
"temperature": 0.7
})

var response_json = engine.llm_chat(request)
var response = JSON.parse_string(response_json)
return response["choices"][0]["message"]["content"]
note

Blocking calls freeze the game until the response is complete. Use them during loading screens or for very short responses. For gameplay dialogue, use streaming (next step).

Step 5: Streaming with Signals

Streaming delivers tokens one at a time via Godot signals. This is the recommended approach for dialogue during gameplay:

extends Node

@onready var engine = $AtelicoEngineNode
@onready var label = $DialogueLabel

var stream_text := ""

func _ready():
# ... initialization from Step 3 ...

# Connect to streaming signals
engine.inference_token_generated.connect(_on_token)
engine.inference_completed.connect(_on_stream_done)

func ask_npc(player_message: String) -> void:
# Clear the display
stream_text = ""
label.text = ""

var request = JSON.stringify({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are Greta, a grumpy blacksmith. You secretly care about the player but never admit it. Keep responses under 3 sentences."},
{"role": "user", "content": player_message}
],
"max_tokens": 150,
"temperature": 0.8
})

# Start streaming — returns immediately with a job_id
var job_id = engine.llm_chat_stream(request)
print("Streaming started, job_id: ", job_id)

func _on_token(job_id: int, chunk_json: String) -> void:
# Called once per frame with new token data
var chunk = JSON.parse_string(chunk_json)
var delta = chunk["choices"][0]["delta"]
if delta.has("content") and delta["content"] != null:
stream_text += delta["content"]
label.text = stream_text

func _on_stream_done(job_id: int, success: bool) -> void:
if success:
print("NPC finished speaking: ", stream_text)
else:
print("Streaming failed")

How it works internally:

  1. llm_chat_stream() queues the request and returns immediately
  2. The engine's _process() dispatches the request to the backend on the next frame
  3. Each frame, _process() polls for new tokens and emits inference_token_generated
  4. When the stream ends, inference_completed fires

All signals fire on the main thread — no threading code needed.

Step 6: Multi-Turn Conversation

Maintain a conversation history so the NPC remembers what was said:

extends Node

@onready var engine = $AtelicoEngineNode
@onready var label = $DialogueLabel
@onready var input_field = $InputField

var conversation: Array = []
var stream_text := ""

func _ready():
# ... initialization ...

engine.inference_token_generated.connect(_on_token)
engine.inference_completed.connect(_on_stream_done)

# Set the NPC's personality
conversation.append({
"role": "system",
"content": "You are Greta, a grumpy blacksmith in a medieval village. You secretly care about the player but would never admit it. Keep responses under 3 sentences."
})

func _on_input_submitted(text: String) -> void:
if text.strip_edges().is_empty():
return
input_field.text = ""

# Add the player's message to history
conversation.append({"role": "user", "content": text})

# Clear display and start streaming
stream_text = ""
label.text = ""

var request = JSON.stringify({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": conversation,
"max_tokens": 150,
"temperature": 0.8
})
engine.llm_chat_stream(request)

func _on_token(job_id: int, chunk_json: String) -> void:
var chunk = JSON.parse_string(chunk_json)
var delta = chunk["choices"][0]["delta"]
if delta.has("content") and delta["content"] != null:
stream_text += delta["content"]
label.text = stream_text

func _on_stream_done(job_id: int, success: bool) -> void:
if success:
# Store the NPC's response for future context
conversation.append({"role": "assistant", "content": stream_text})

Now the NPC remembers the entire conversation across turns.

Step 7: GPU Scheduling (Optional)

Control how GPU time is shared between rendering and AI inference:

var singleton = Engine.get_singleton("AtelicoSingleton")

# During action gameplay — prioritize smooth rendering
singleton.set_gpu_scheduling_mode(2) # PRIORITIZE_GRAPHICS

# During dialogue scenes — prioritize fast AI responses
singleton.set_gpu_scheduling_mode(0) # PRIORITIZE_COMPUTE

# Default balanced mode
singleton.set_gpu_scheduling_mode(1) # BALANCE

# Dynamic limits
singleton.set_vram_budget_mb(4096)
singleton.set_target_tokens_per_second(15)
singleton.set_frame_time_ms(16) # 60 FPS target

Async (Non-Blocking, Non-Streaming)

If you want a complete response without streaming but also without freezing the game thread, use the async variant:

func _ready():
engine.async_request_completed.connect(_on_async_done)

func ask_npc_async(player_message: String) -> void:
var request = JSON.stringify({
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [{"role": "user", "content": player_message}],
"max_tokens": 100
})
var job_id = engine.llm_chat_async(request)

func _on_async_done(job_id: int, response_json: String) -> void:
var response = JSON.parse_string(response_json)
label.text = response["choices"][0]["message"]["content"]

This returns the full response at once (no token-by-token streaming) but doesn't block the game thread.

Audio: TTS & STT

The Godot node exposes three audio methods plus matching signals. Audio bytes cross the API boundary as base64-encoded WAV files; pair them with Marshalls.base64_to_raw and AudioStreamWAV to feed an AudioStreamPlayer.

# Blocking TTS
var resp_json: String = engine.audio_synthesize(JSON.stringify({
"model": "in-memory::tts",
"input": "Welcome, traveler.",
"voice": "af_heart",
}))
var resp: Dictionary = JSON.parse_string(resp_json)
var wav: PackedByteArray = Marshalls.base64_to_raw(resp["audio_b64"])
# Convert wav → AudioStreamWAV and play through an AudioStreamPlayer.

# Streaming TTS — chunks arrive on the audio_synthesis_chunk signal,
# completion on audio_synthesis_completed.
engine.audio_synthesis_chunk.connect(_on_audio_chunk)
engine.audio_synthesis_completed.connect(_on_audio_done)

var job_id: int = engine.audio_synthesize_stream(JSON.stringify({
"model": "in-memory::pocket-tts",
"input": "First sentence. Second one comes right after.",
"voice": "alba",
}))

func _on_audio_chunk(job_id: int, chunk_json: String) -> void:
var chunk: Dictionary = JSON.parse_string(chunk_json)
var bytes: PackedByteArray = Marshalls.base64_to_raw(chunk["audio"])
# queue bytes into your AudioStreamPlayer; chunk["text"] is the source sentence

func _on_audio_done(job_id: int, success: bool) -> void:
print("synth done: ", success)

# Blocking STT — pass a WAV file as base64
var wav_bytes: PackedByteArray = FileAccess.get_file_as_bytes("user://speech.wav")
var stt_json: String = engine.audio_transcribe(JSON.stringify({
"model": "in-memory::whisper",
"audio_b64": Marshalls.raw_to_base64(wav_bytes),
}))
print(JSON.parse_string(stt_json)["text"])

Whisper variant ids: whisper (default → whisper-base.en), whisper-tiny[.en], whisper-base[.en], whisper-small[.en], whisper-medium[.en], whisper-large-v3[-turbo], distil-large-v3. TTS ids: tts (default → kokoro-82m), kokoro, kokoro-82m, pocket, pocket-tts. See the Audio guide for voices, voice cloning, and quantization options.

Next Steps