C FFI: Getting Started
The C FFI is the low-level interface for embedding the Atelico AI Engine in any language or engine that can call C functions. Unity and Unreal use this under the hood, but you can use it directly for custom engines, Swift/Kotlin mobile apps, or any other integration.
What You'll Build
A C/C++ program that initializes the engine, loads a model, streams a chat response token-by-token, and shuts down cleanly.
By the end, you'll understand:
- How to link the native library
- The engine lifecycle (create / use / destroy)
- How to make a blocking chat request
- How to stream tokens with the poll pattern
- Error handling and string ownership
Prerequisites
- A C or C++ compiler
- The Atelico native library (
libatelico_ffi.aoratelico_ffi.dll) - The
atelico_ffi.hheader file - A downloaded model:
./atelico-asset-downloader download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M
Step 1: Link the Library
Static Linking (Recommended)
# Compile with static library (macOS)
clang++ -std=c++17 main.cpp -L./lib -latelico_ffi -framework Metal -framework Foundation -o my_app
# Linux
g++ -std=c++17 main.cpp -L./lib -latelico_ffi -lpthread -ldl -lm -o my_app
Dynamic Linking
# macOS
clang++ -std=c++17 main.cpp -L./lib -latelico_ffi -o my_app
# Run with: DYLD_LIBRARY_PATH=./lib ./my_app
# Linux
g++ -std=c++17 main.cpp -L./lib -latelico_ffi -o my_app
# Run with: LD_LIBRARY_PATH=./lib ./my_app
# Windows (MSVC)
cl /EHsc main.cpp /link atelico_ffi.dll.lib
# atelico_ffi.dll must be next to the exe at runtime
Include the Header
#include "atelico_ffi.h"
Step 2: Engine Lifecycle
#include "atelico_ffi.h"
#include <cstdio>
int main()
{
// Create the engine (NULL config = auto-detect GPU, use defaults)
AtelicoEngine* engine = nullptr;
int32_t rc = atelico_engine_create(nullptr, &engine);
if (rc != ATELICO_OK || engine == nullptr)
{
fprintf(stderr, "Failed to create engine: %s\n", atelico_last_error());
return 1;
}
printf("Engine created successfully\n");
// ... use the engine ...
// Shut down and free all resources
atelico_engine_destroy(engine);
return 0;
}
Key rules:
atelico_engine_createmust be called from the main threadatelico_engine_destroyis safe to call onnullptr- Only one engine instance should exist at a time
Step 3: Load a Model
// Pre-load a model (blocking — downloads if not cached)
rc = atelico_model_load(engine, "meta-llama/Llama-3.2-3B-Instruct-Q4_K_M");
if (rc != ATELICO_OK)
{
fprintf(stderr, "Failed to load model: %s\n", atelico_last_error());
atelico_engine_destroy(engine);
return 1;
}
printf("Model loaded\n");
Step 4: Blocking Chat Request
const char* request = R"({
"model": "meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are a friendly tavern keeper. Keep responses under 2 sentences."},
{"role": "user", "content": "What's on the menu?"}
],
"max_tokens": 100,
"temperature": 0.7
})";
const char* response = nullptr;
rc = atelico_llm_chat(engine, request, &response);
if (rc == ATELICO_OK && response != nullptr)
{
printf("Response: %s\n", response);
// IMPORTANT: response is valid only until the next API call on this thread.
// Copy it if you need to keep it.
}
else
{
fprintf(stderr, "Chat failed: %s\n", atelico_last_error());
}
Step 5: Streaming with the Poll Pattern
Streaming is the recommended approach for real-time applications. It uses a poll loop that fits naturally into a game's frame update:
// Start a streaming request — returns immediately
uint64_t stream_id = 0;
rc = atelico_llm_chat_stream(engine, request, &stream_id);
if (rc != ATELICO_OK)
{
fprintf(stderr, "Failed to start stream: %s\n", atelico_last_error());
return 1;
}
// Poll for tokens (in a game, do this once per frame in your update loop)
bool done = false;
while (!done)
{
const char* chunk_json = nullptr;
int32_t poll_rc = atelico_stream_poll(engine, stream_id, &chunk_json);
switch (poll_rc)
{
case ATELICO_OK:
// Got a token chunk — print it
// chunk_json is valid until the next API call
printf("%s", chunk_json);
fflush(stdout);
break;
case ATELICO_ERR_STREAM_EMPTY:
// No data yet — in a game loop, just continue to the next frame
// In a console app, sleep briefly to avoid busy-waiting
break;
case ATELICO_ERR_STREAM_DONE:
// Stream finished
done = true;
break;
default:
// Error
fprintf(stderr, "\nStream error: %s\n", atelico_last_error());
done = true;
break;
}
}
// Clean up the stream handle
atelico_stream_destroy(engine, stream_id);
printf("\nDone.\n");
In a Game Loop
The poll pattern is designed for frame-driven engines:
// Called every frame by your engine
void OnUpdate()
{
// Signal frame timing to the AI scheduler
atelico_engine_on_frame(engine);
// Poll active stream for new tokens
if (active_stream_id != 0)
{
const char* chunk = nullptr;
int32_t rc = atelico_stream_poll(engine, active_stream_id, &chunk);
if (rc == ATELICO_OK && chunk != nullptr)
{
// Append token to dialogue UI
AppendToDialogue(chunk);
}
else if (rc == ATELICO_ERR_STREAM_DONE)
{
atelico_stream_destroy(engine, active_stream_id);
active_stream_id = 0;
OnDialogueComplete();
}
// STREAM_EMPTY: no data this frame, try next frame
}
}
Step 6: Error Handling
All functions return int32_t result codes. On failure, call atelico_last_error() for a human-readable message:
int32_t rc = atelico_llm_chat(engine, bad_request, &response);
if (rc != ATELICO_OK)
{
const char* error = atelico_last_error();
// error is thread-local and valid until the next API call
fprintf(stderr, "Error (code %d): %s\n", rc, error);
}
| Code | Constant | Meaning |
|---|---|---|
| 0 | ATELICO_OK | Success |
| -1 | ATELICO_ERR_INVALID_HANDLE | NULL or invalid engine pointer |
| -2 | ATELICO_ERR_INVALID_ARG | NULL required argument |
| -3 | ATELICO_ERR_INIT_FAILED | Engine initialization failed |
| -4 | ATELICO_ERR_MODEL_NOT_FOUND | Model ID not recognized |
| -5 | ATELICO_ERR_INFERENCE_FAILED | Inference error |
| -6 | ATELICO_ERR_STREAM_DONE | Stream completed (not an error) |
| -7 | ATELICO_ERR_STREAM_EMPTY | No data available yet (not an error) |
| -8 | ATELICO_ERR_JSON_PARSE | Invalid JSON in request |
| -9 | ATELICO_ERR_STORE_NOT_FOUND | KV store not found |
| -10 | ATELICO_ERR_IO | I/O error |
| -11 | ATELICO_ERR_BLOCKED | Guardrail blocked content |
| -99 | ATELICO_ERR_INTERNAL | Internal error |
String Ownership
Strings you pass to the API: The API reads them during the call. You retain ownership and can free them after the call returns.
Strings returned by the API: Stored in a thread-local buffer. Valid only until the next API call on the same thread. Copy immediately if you need to keep them:
const char* response = nullptr;
atelico_llm_chat(engine, request, &response);
// Copy before making another API call
std::string saved_response = response; // C++ copies here
// Now safe to make another call — response pointer is now invalid
atelico_llm_chat(engine, another_request, &response);
GPU Scheduling
Control GPU resource sharing at runtime:
// Scheduling modes
atelico_engine_set_scheduling_mode(engine, ATELICO_SCHEDULE_BALANCE); // Default
atelico_engine_set_scheduling_mode(engine, ATELICO_SCHEDULE_PRIORITIZE_COMPUTE); // Fast AI
atelico_engine_set_scheduling_mode(engine, ATELICO_SCHEDULE_PRIORITIZE_GRAPHICS); // Smooth FPS
// Resource limits
atelico_engine_set_vram_budget_mb(engine, 4096); // Cap VRAM
atelico_engine_set_target_tps(engine, 15); // Limit tokens/sec
atelico_engine_set_frame_time_ms(engine, 16); // 60 FPS hint
Complete Example
#include "atelico_ffi.h"
#include <cstdio>
#include <cstring>
#include <string>
int main()
{
// Create engine
AtelicoEngine* engine = nullptr;
if (atelico_engine_create(nullptr, &engine) != ATELICO_OK)
{
fprintf(stderr, "Init failed: %s\n", atelico_last_error());
return 1;
}
// Load model
if (atelico_model_load(engine, "meta-llama/Llama-3.2-3B-Instruct-Q4_K_M") != ATELICO_OK)
{
fprintf(stderr, "Model load failed: %s\n", atelico_last_error());
atelico_engine_destroy(engine);
return 1;
}
// Stream a chat response
const char* request = R"({
"model": "meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are a narrator for a fantasy RPG."},
{"role": "user", "content": "Describe the entrance to the dungeon."}
],
"max_tokens": 200,
"temperature": 0.8
})";
uint64_t stream = 0;
if (atelico_llm_chat_stream(engine, request, &stream) != ATELICO_OK)
{
fprintf(stderr, "Stream failed: %s\n", atelico_last_error());
atelico_engine_destroy(engine);
return 1;
}
printf("NPC: ");
bool done = false;
while (!done)
{
const char* chunk = nullptr;
int32_t rc = atelico_stream_poll(engine, stream, &chunk);
if (rc == ATELICO_OK)
printf("%s", chunk);
else if (rc == ATELICO_ERR_STREAM_DONE)
done = true;
else if (rc != ATELICO_ERR_STREAM_EMPTY)
{
fprintf(stderr, "\nError: %s\n", atelico_last_error());
done = true;
}
}
printf("\n");
atelico_stream_destroy(engine, stream);
atelico_engine_destroy(engine);
return 0;
}
Audio: TTS & STT
Two functions cover speech synthesis (Kokoro / Pocket TTS) and transcription (Whisper); a third gives you sentence-by-sentence streaming synthesis. Audio bytes cross the FFI boundary as base64-encoded WAV files.
// Blocking TTS — returns base64 WAV in the response JSON
const char* req = "{\"model\":\"in-memory::tts\",\"input\":\"Hello.\",\"voice\":\"af_heart\"}";
const char* resp = NULL;
atelico_audio_synthesize(engine, req, &resp);
// resp: {"audio_b64":"UklGRn...","duration_seconds":1.42,"format":"wav","sample_rate":24000}
// Streaming TTS — chunks per sentence via the same poll pattern as chat
uint64_t stream = 0;
atelico_audio_synthesize_stream(engine,
"{\"model\":\"in-memory::pocket-tts\",\"input\":\"First. Second.\",\"voice\":\"alba\"}",
&stream);
const char* chunk = NULL;
while (1) {
int rc = atelico_stream_poll(engine, stream, &chunk);
if (rc == ATELICO_OK) {
// chunk is an AudioSpeechChunk JSON: {sequence, audio (b64), duration_seconds, text}
play_chunk(chunk);
} else if (rc == ATELICO_ERR_STREAM_DONE) {
break;
} else if (rc == ATELICO_ERR_STREAM_EMPTY) {
sleep_one_frame(); // try again next frame
}
}
atelico_stream_destroy(engine, stream);
// Blocking STT — pass a base64-encoded WAV file
const char* stt_req = "{\"model\":\"in-memory::whisper\",\"audio_b64\":\"UklGRn...\"}";
const char* stt_resp = NULL;
atelico_audio_transcribe(engine, stt_req, &stt_resp);
// stt_resp: {"text":"Hello.","language":"en","duration":1.0}
Whisper variant ids: whisper (default → whisper-base.en), whisper-tiny[.en], whisper-base[.en], whisper-small[.en], whisper-medium[.en], whisper-large-v3[-turbo], distil-large-v3. TTS ids: tts (default → kokoro-82m), kokoro, kokoro-82m, pocket, pocket-tts. See the Audio guide for the full feature matrix.
Next Steps
- C FFI API Reference — full list of all functions, constants, and types
- Chat Completions API — detailed API reference (same JSON format)
- Audio (TTS & STT) — voices, voice cloning, quantization, env vars
- Structured Generation — constrain output to JSON schemas