Atelico AI Engine
On-device AI inference engine for games and interactive applications. LLMs, image generation, embeddings, classifiers, guardrails, and a semantic memory system -- running locally on the player's hardware, with native integration for Godot, Unity, and Unreal Engine.
Three Things That Make Atelico Different
Creative Control
AI left to its own devices produces unreliable output. Atelico gives you the tools to keep it on script:
- Structured generation forces output into exact JSON schemas -- guaranteed valid data your game code can parse directly, every time
- Semantic KV Store lets you author dialogue, lore, and game data, then retrieve the right piece at the right moment based on meaning, not keywords
- Guardrails filter unsafe content at multiple levels (keyword blocklists, ML classifiers, LLM judges) with customizable presets and the option to rewrite rather than just block
- LoRA adapters hot-swap model personality at runtime -- different voices for different NPCs without loading separate models
Runs in Your Game
The engine embeds directly in your game process with native SDKs for Godot, Unity, and Unreal Engine. Not a sidecar process, not an HTTP call to localhost -- in-process, sharing your GPU.
Frame-aware scheduling lets you control the priority:
- Prioritize Graphics during action sequences -- AI yields GPU time to keep FPS smooth
- Balance for normal gameplay -- even split between inference and rendering
- Prioritize Compute during dialogue scenes -- fastest AI responses
On NVIDIA GPUs with compatible drivers, hardware-level GPU sharing (Compute-in-Graphics) eliminates context-switching overhead entirely.
Runs on Device
No cloud, no API keys, no per-token costs, no internet required. Models ship with your game as bundled assets.
- Metal on Apple Silicon (macOS, iOS)
- CUDA on NVIDIA GPUs (Windows, Linux)
- CPU everywhere else
- Quantized models (GGUF) run with as little as 0.8 GB of VRAM for a 1B model
- 1-bit models (Bonsai) compress an 8B model to ~1.15 GB
Player data never leaves the device.
Two Ways to Integrate
HTTP Server
An OpenAI-compatible REST API on port 11434. Any code that works with OpenAI works with Atelico by changing the base URL. Use this for prototyping, tools, content pipelines, or when your game communicates over HTTP.
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [{"role": "user", "content": "Hello!"}]}'
Native SDKs
Embed the engine directly inside your game process:
| SDK | Integration | Streaming Pattern |
|---|---|---|
| Godot | GDExtension (zero-overhead Rust binding) | Signals |
| Unity | UPM package (C# via P/Invoke) | Callbacks |
| Unreal | Plugin (UGameInstanceSubsystem) | Delegates / Blueprints |
| Python | Native module (PyO3) | Iterator |
| C/C++ | Static/dynamic library | Poll loop |
Both the server and SDKs use the same JSON request format -- code and prompts are portable between them.
Capabilities
| Capability | Description |
|---|---|
| LLM Chat | Multi-turn conversation with system prompts, streaming, and temperature control |
| Structured Generation | Constrain output to a JSON Schema, regex, choice, or grammar -- guaranteed parseable |
| Text-to-Speech | On-device TTS via Kokoro 82M — 54 voices, 9 languages, streaming audio |
| Speech-to-Text | On-device transcription via Whisper |
| Image Generation | Generate images from text prompts on-device (~1s on Apple Silicon) |
| Vision Embeddings | DINOv2 vision embeddings and MAETok image tokenization |
| Embeddings | Convert text to semantic vectors for similarity search |
| Semantic KV Store | Store authored content and retrieve by meaning with faceted filtering |
| Text Classifiers | Categorize text (intent detection, content moderation) |
| Guardrails | Layered safety: keyword filters, ML classifiers, LLM judges, content rewriting |
| LoRA Adapters | Hot-swap model personality at runtime without reloading |
| Multi-Backend Routing | Seamlessly mix local inference with cloud API proxies (OpenAI, etc.) |
Supported Models
| Architecture | Example Models | Formats |
|---|---|---|
| LLaMA / Mistral | Llama 3.x, Mistral 7B | SafeTensors, GGUF |
| Qwen 3 | Qwen3-0.6B through Qwen3-8B | SafeTensors, GGUF |
| Qwen 3.5 | Qwen3.5 (hybrid DeltaNet + attention) | SafeTensors, GGUF |
| Gemma 4 | Gemma 4 (MoE) | SafeTensors, GGUF |
| Parcae | Parcae (stable looped transformer) | SafeTensors |
| SmolLM | SmolLM2, SmolLM3 | SafeTensors, GGUF |
| Bonsai 1-bit | PrismML Bonsai 1.7B, 8B | SafeTensors |
| PixArt / Sana | PixArt-Alpha, Sana Sprint | SafeTensors |
| Kokoro | Kokoro 82M TTS | SafeTensors |
| Whisper | Whisper STT | SafeTensors |
Any HuggingFace model using a supported architecture works. Quantized (GGUF) models are recommended for shipping games -- smaller download, less VRAM, minimal quality loss.
Supported Platforms
| Platform | GPU Backend | Server | Godot | Unity | Unreal | Python | C FFI |
|---|---|---|---|---|---|---|---|
| macOS (Apple Silicon) | Metal | Yes | Yes | Yes | Yes | Yes | Yes |
| Windows (NVIDIA) | CUDA | Yes | Yes | Yes | Yes | Yes | Yes |
| Linux (NVIDIA) | CUDA | Yes | Yes | Yes | Yes | Yes | Yes |
| Any platform | CPU | Yes | Yes | Yes | Yes | Yes | Yes |
| iOS | Metal | -- | -- | -- | -- | -- | Yes |
Get Started
Server path -- quickest way to try the engine:
- Getting Started -- download a model, start the server, send your first request
- Chat Completions API -- streaming, multi-turn, temperature control
- Structured Generation -- force JSON output matching a schema
SDK path -- for shipping games:
Guides:
- NPC Dialogue -- personality, streaming, multi-turn memory, emotion tags
- Structured Game Data -- quests, items, encounters as typed JSON
- Models -- choosing, downloading, and managing models