Version: 0.8

Godot: Getting Started

This guide walks you through setting up the Atelico AI Engine in a Godot 4 project and building an interactive NPC dialogue system with streaming text.

What You'll Build

A scene where the player types messages to an NPC, and the NPC responds with AI-generated dialogue that streams word-by-word into a Label — like a typewriter effect.

By the end, you'll understand:

How to install the GDExtension and add engine nodes to your scene
How to configure backends and load a model
How to send a blocking chat request
How to stream tokens with signals for real-time dialogue
How to maintain conversation history

Prerequisites

Godot 4.2 or later
The Atelico server bundle (atelico-asset-downloader)
A downloaded model:

./atelico-asset-downloader download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M

Step 1: Install the GDExtension

Copy the Files

Create the following structure in your Godot project:

your_project/
├── addons/
│   └── atelico/
│       ├── atelico_ai_engine.gdextension
│       └── bin/
│           ├── libatelico_ai_engine.dylib    (macOS)
│           ├── libatelico_ai_engine.so        (Linux)
│           └── atelico_ai_engine.dll          (Windows)
├── project.godot
└── ...

Create the .gdextension File

Create addons/atelico/atelico_ai_engine.gdextension with this content:

[configuration]
entry_symbol = "gdext_rust_init"
compatibility_minimum = 4.2

[libraries]
macos.release = "res://addons/atelico/bin/libatelico_ai_engine.dylib"
linux.release.x86_64 = "res://addons/atelico/bin/libatelico_ai_engine.so"
windows.release.x86_64 = "res://addons/atelico/bin/atelico_ai_engine.dll"

Enable the Plugin

Open Project > Project Settings > Plugins and enable "Atelico AI Engine".

Step 2: Set Up Your Scene

Add the engine node to your scene tree:

MainScene (Node)
├── AtelicoEngineNode
├── DialogueLabel (Label)
└── InputField (LineEdit)

In the editor, add an AtelicoEngineNode as a child node — it appears in the "Add Node" dialog after the plugin is enabled.

Step 3: Initialize the Engine

Create a script on your main scene node:

extends Node

@onready var engine = $AtelicoEngineNode
@onready var label = $DialogueLabel

func _ready():
    # Configure GPU scheduling (optional)
    var singleton = Engine.get_singleton("AtelicoSingleton")
    singleton.set_gpu_scheduling_mode(1)  # 0=compute, 1=balance, 2=graphics
    singleton.set_vram_budget_mb(4096)

    # Define backends — in-memory for local on-device inference
    var backends = [
        {
            "name": "in-memory",
            "type": "in-memory",
            "config": {}
        }
    ]
    engine.initialize_engine(backends)

    # Pre-load the model (blocking — do this during a loading screen)
    engine.model_load("in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M")
    print("Engine ready!")

The first model_load call downloads the model to the local cache if needed. Subsequent launches load from cache.

Step 4: Blocking Chat Request

The simplest way to get a response:

func ask_npc_sync(player_message: String) -> String:
    var request = JSON.stringify({
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": [
            {"role": "system", "content": "You are Boris, a friendly tavern keeper. Keep responses under 2 sentences."},
            {"role": "user", "content": player_message}
        ],
        "max_tokens": 100,
        "temperature": 0.7
    })

    var response_json = engine.llm_chat(request)
    var response = JSON.parse_string(response_json)
    return response["choices"][0]["message"]["content"]

note

Blocking calls freeze the game until the response is complete. Use them during loading screens or for very short responses. For gameplay dialogue, use streaming (next step).

Step 5: Streaming with Signals

Streaming delivers tokens one at a time via Godot signals. This is the recommended approach for dialogue during gameplay:

extends Node

@onready var engine = $AtelicoEngineNode
@onready var label = $DialogueLabel

var stream_text := ""

func _ready():
    # ... initialization from Step 3 ...

    # Connect to streaming signals
    engine.inference_token_generated.connect(_on_token)
    engine.inference_completed.connect(_on_stream_done)

func ask_npc(player_message: String) -> void:
    # Clear the display
    stream_text = ""
    label.text = ""

    var request = JSON.stringify({
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": [
            {"role": "system", "content": "You are Greta, a grumpy blacksmith. You secretly care about the player but never admit it. Keep responses under 3 sentences."},
            {"role": "user", "content": player_message}
        ],
        "max_tokens": 150,
        "temperature": 0.8
    })

    # Start streaming — returns immediately with a job_id
    var job_id = engine.llm_chat_stream(request)
    print("Streaming started, job_id: ", job_id)

func _on_token(job_id: int, chunk_json: String) -> void:
    # Called once per frame with new token data
    var chunk = JSON.parse_string(chunk_json)
    var delta = chunk["choices"][0]["delta"]
    if delta.has("content") and delta["content"] != null:
        stream_text += delta["content"]
        label.text = stream_text

func _on_stream_done(job_id: int, success: bool) -> void:
    if success:
        print("NPC finished speaking: ", stream_text)
    else:
        print("Streaming failed")

How it works internally:

llm_chat_stream() queues the request and returns immediately
The engine's _process() dispatches the request to the backend on the next frame
Each frame, _process() polls for new tokens and emits inference_token_generated
When the stream ends, inference_completed fires

All signals fire on the main thread — no threading code needed.

Step 6: Multi-Turn Conversation

Maintain a conversation history so the NPC remembers what was said:

extends Node

@onready var engine = $AtelicoEngineNode
@onready var label = $DialogueLabel
@onready var input_field = $InputField

var conversation: Array = []
var stream_text := ""

func _ready():
    # ... initialization ...

    engine.inference_token_generated.connect(_on_token)
    engine.inference_completed.connect(_on_stream_done)

    # Set the NPC's personality
    conversation.append({
        "role": "system",
        "content": "You are Greta, a grumpy blacksmith in a medieval village. You secretly care about the player but would never admit it. Keep responses under 3 sentences."
    })

func _on_input_submitted(text: String) -> void:
    if text.strip_edges().is_empty():
        return
    input_field.text = ""

    # Add the player's message to history
    conversation.append({"role": "user", "content": text})

    # Clear display and start streaming
    stream_text = ""
    label.text = ""

    var request = JSON.stringify({
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": conversation,
        "max_tokens": 150,
        "temperature": 0.8
    })
    engine.llm_chat_stream(request)

func _on_token(job_id: int, chunk_json: String) -> void:
    var chunk = JSON.parse_string(chunk_json)
    var delta = chunk["choices"][0]["delta"]
    if delta.has("content") and delta["content"] != null:
        stream_text += delta["content"]
        label.text = stream_text

func _on_stream_done(job_id: int, success: bool) -> void:
    if success:
        # Store the NPC's response for future context
        conversation.append({"role": "assistant", "content": stream_text})

Now the NPC remembers the entire conversation across turns.

Step 7: GPU Scheduling (Optional)

Control how GPU time is shared between rendering and AI inference:

var singleton = Engine.get_singleton("AtelicoSingleton")

# During action gameplay — prioritize smooth rendering
singleton.set_gpu_scheduling_mode(2)  # PRIORITIZE_GRAPHICS

# During dialogue scenes — prioritize fast AI responses
singleton.set_gpu_scheduling_mode(0)  # PRIORITIZE_COMPUTE

# Default balanced mode
singleton.set_gpu_scheduling_mode(1)  # BALANCE

# Dynamic limits
singleton.set_vram_budget_mb(4096)
singleton.set_target_tokens_per_second(15)
singleton.set_frame_time_ms(16)  # 60 FPS target

Async (Non-Blocking, Non-Streaming)

If you want a complete response without streaming but also without freezing the game thread, use the async variant:

func _ready():
    engine.async_request_completed.connect(_on_async_done)

func ask_npc_async(player_message: String) -> void:
    var request = JSON.stringify({
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": [{"role": "user", "content": player_message}],
        "max_tokens": 100
    })
    var job_id = engine.llm_chat_async(request)

func _on_async_done(job_id: int, response_json: String) -> void:
    var response = JSON.parse_string(response_json)
    label.text = response["choices"][0]["message"]["content"]

This returns the full response at once (no token-by-token streaming) but doesn't block the game thread.

Audio: TTS & STT

The Godot node exposes three audio methods plus matching signals. Audio bytes cross the API boundary as base64-encoded WAV files; pair them with Marshalls.base64_to_raw and AudioStreamWAV to feed an AudioStreamPlayer.

# Blocking TTS
var resp_json: String = engine.audio_synthesize(JSON.stringify({
    "model": "in-memory::tts",
    "input": "Welcome, traveler.",
    "voice": "af_heart",
}))
var resp: Dictionary = JSON.parse_string(resp_json)
var wav: PackedByteArray = Marshalls.base64_to_raw(resp["audio_b64"])
# Convert wav → AudioStreamWAV and play through an AudioStreamPlayer.

# Streaming TTS — chunks arrive on the audio_synthesis_chunk signal,
# completion on audio_synthesis_completed.
engine.audio_synthesis_chunk.connect(_on_audio_chunk)
engine.audio_synthesis_completed.connect(_on_audio_done)

var job_id: int = engine.audio_synthesize_stream(JSON.stringify({
    "model": "in-memory::pocket-tts",
    "input": "First sentence. Second one comes right after.",
    "voice": "alba",
}))

func _on_audio_chunk(job_id: int, chunk_json: String) -> void:
    var chunk: Dictionary = JSON.parse_string(chunk_json)
    var bytes: PackedByteArray = Marshalls.base64_to_raw(chunk["audio"])
    # queue bytes into your AudioStreamPlayer; chunk["text"] is the source sentence

func _on_audio_done(job_id: int, success: bool) -> void:
    print("synth done: ", success)

# Blocking STT — pass a WAV file as base64
var wav_bytes: PackedByteArray = FileAccess.get_file_as_bytes("user://speech.wav")
var stt_json: String = engine.audio_transcribe(JSON.stringify({
    "model": "in-memory::whisper",
    "audio_b64": Marshalls.raw_to_base64(wav_bytes),
}))
print(JSON.parse_string(stt_json)["text"])

Whisper variant ids: whisper (default → whisper-base.en), whisper-tiny[.en], whisper-base[.en], whisper-small[.en], whisper-medium[.en], whisper-large-v3[-turbo], distil-large-v3. TTS ids: tts (default → kokoro-82m), kokoro, kokoro-82m, pocket, pocket-tts. See the Audio guide for voices, voice cloning, and quantization options.

Next Steps

Structured Generation — force JSON output with emotion tags for driving NPC animations
Audio (TTS & STT) — voices, voice cloning, quantization, env vars
Godot API Reference — full list of all nodes, methods, and signals
Chat Completions API — detailed API reference (same JSON format)

What You'll Build​

Prerequisites​

Step 1: Install the GDExtension​

Copy the Files​

Create the .gdextension File​

Enable the Plugin​

Step 2: Set Up Your Scene​

Step 3: Initialize the Engine​

Step 4: Blocking Chat Request​

Step 5: Streaming with Signals​

Step 6: Multi-Turn Conversation​

Step 7: GPU Scheduling (Optional)​

Async (Non-Blocking, Non-Streaming)​

Audio: TTS & STT​

Next Steps​