Skip to main content
Version: 0.7

Unity API Reference

AtelicoEngine

Main Atelico AI Engine singleton. Add to a GameObject to initialize the engine. Persists across scene loads via DontDestroyOnLoad. Provides access to all AI subsystems (LLM, images, embeddings, classifiers, guardrails, LoRA, KV store, ANN) through typed accessor properties.

Properties

PropertyTypeDescription
InstanceAtelicoEngineGlobal singleton instance. Set during Awake and cleared on destroy. Returns null if no AtelicoEngine exists in the scene.
LlmAtelicoLlmLLM subsystem for chat completions, text completions, the Responses API, and templated LM function calls. Null if the engine failed to initialize.
ImagesAtelicoImagesImage generation subsystem for text-to-image generation and background removal. Null if the engine failed to initialize.
EmbeddingsAtelicoEmbeddingsEmbedding subsystem for generating text embeddings and computing semantic similarity. Null if the engine failed to initialize.
ClassifiersAtelicoClassifiersClassifier subsystem for embedding-based text classification (centroid and KNN/HNSW). Null if the engine failed to initialize.
GuardrailsAtelicoGuardrailsGuardrails subsystem for content safety checking of user inputs, model outputs, and image generation prompts. Null if the engine failed to initialize.
LoraAtelicoLoraLoRA adapter management subsystem for loading, unloading, and scaling adapters on top of loaded base models. Null if the engine failed to initialize.
KvStoreAtelicoKvStoreKey-value store subsystem with embedding-backed semantic queries for storing and searching game data (NPC dialogue, lore, inventory). Null if the engine failed to initialize.
AnnAtelicoAnnApproximate Nearest Neighbor (ANN) index subsystem for vector search, backed by HNSW. Pure data structure that operates on pre-computed vectors. Null if the engine failed to initialize.

Methods

void RegisterStream(ulong streamId, StreamCallbacks callbacks)

Registers a stream for per-frame polling in Update. Called by subsystems (e.g., AtelicoLlm.ChatCompletionStream) after starting a streaming operation. Not intended for direct use by game code.

  • streamId: Native stream handle returned by the C ABI.

  • callbacks: Callback set invoked as stream data arrives, completes, or errors.

void SetSchedulingMode(SchedulingMode mode)

Set the GPU scheduling mode to control how rendering and inference share GPU time. Takes effect immediately for subsequent inference operations. PrioritizeCompute minimizes AI latency; Balance is the default; PrioritizeGraphics maximizes FPS.

  • mode: The desired scheduling priority balance (PrioritizeCompute, Balance, or PrioritizeGraphics).

SchedulingMode GetSchedulingMode()

Get the current GPU scheduling mode. Returns Balance if the engine is not initialized.

Returns: The active SchedulingMode value (PrioritizeCompute, Balance, or PrioritizeGraphics).

void SetVramBudgetMb(uint mb)

Set the maximum VRAM budget in megabytes for AI model storage. The engine will avoid loading models that would exceed this limit.

  • mb: VRAM budget in MB. Use 0 for unlimited.

void SetTargetTps(uint tps)

Set the target tokens-per-second rate for inference pacing. The engine will yield GPU time between tokens to stay near this rate, freeing GPU cycles for rendering.

  • tps: Target tokens per second. Use 0 for unlimited (fastest possible).

void SetFrameTimeMs(uint ms)

Set the target frame time in milliseconds for frame-budget-aware scheduling. The engine uses this to decide how much GPU work to perform per frame.

  • ms: Target frame time in milliseconds. For example, 16 for 60fps, 33 for 30fps. Use 0 to disable frame budgeting.

bool LoadModel(string modelId)

Load a model synchronously. Blocks the calling thread until the model is fully loaded into memory and ready for inference. The model will be downloaded from HuggingFace Hub if not already cached locally.

  • modelId: Model identifier, typically in the format "in-memory::org/model-name" (e.g., "in-memory::meta-llama/Llama-3.2-1B-Instruct-GGUF").

Returns: True if the model loaded successfully; false on error (e.g., invalid model ID, download failure, insufficient VRAM).

bool UnloadModel(string modelId)

Unload a model, freeing its GPU and system memory. Any in-flight inference using this model will be cancelled.

  • modelId: Model identifier previously passed to LoadModel.

Returns: True if the model was found and unloaded; false if the model was not loaded or on error.

string ListModelsJson()

List all currently loaded models. Returns a JSON array of model descriptors. Returns "[]" if no models are loaded or on error.

Response JSON schema:

[
{
"id": "in-memory::meta-llama/Llama-3.2-1B-Instruct-GGUF",
"type": "llm",
"size_bytes": 1234567890
}
]
  • id (string): The model identifier.
  • type (string): Model type ("llm", "embedding", "image", "classifier").
  • size_bytes (int): Approximate memory usage in bytes.

Returns: JSON array string of model descriptor objects (see Remarks for schema).

bool IsLoaded(string modelId)

Check whether a model is currently loaded and ready for inference.

  • modelId: Model identifier to check.

Returns: True if the model is loaded and ready; false otherwise.

AtelicoAnn

Approximate Nearest Neighbor (ANN) index for vector search, backed by HNSW. Pure data structure that operates on pre-computed vectors -- no GPU or models required. Access via AtelicoEngine.Ann.

Methods

ulong Create(string configJson)

Create a new ANN index with the specified vector dimensionality and capacity.

Configuration JSON schema:

{
"dim": 384,
"max_elements": 10000,
"m": 16,
"ef_construction": 200,
"ef_search": 50
}
  • dim (int, required): Vector dimensionality (must match inserted vectors).

  • max_elements (int, required): Maximum number of vectors the index can hold.

  • m (int): HNSW connections per node (default: 16). Higher values improve recall at the cost of memory.

  • ef_construction (int): Build-time search width (default: 200). Higher values improve index quality.

  • ef_search (int): Query-time search width (default: 50). Higher values improve recall at query time.

  • configJson: JSON string containing the index configuration (see Remarks for schema).

Returns: Index handle (ulong) for use with other ANN methods, or 0 on error.

bool Build(ulong indexId)

Build the HNSW index graph. Must be called after all insertions and before any searches. This operation is O(n * log(n)) and may take noticeable time for large indices.

  • indexId: Index handle returned by Create.

Returns: True on success; false on error.

bool Destroy(ulong indexId)

Destroy an ANN index and free all associated memory.

  • indexId: Index handle returned by Create.

Returns: True if the index was found and destroyed; false otherwise.

AtelicoClassifiers

Classifier subsystem for embedding-based text classification. Supports centroid and KNN/HNSW classifiers. Access via AtelicoEngine.Classifiers.

Methods

string Predict(string requestJson)

Predict the class of input text using a loaded classifier model (blocking).

Request JSON schema:

{
"model_id": "intent-classifier",
"text": "I want to buy a health potion",
"top_k": 3
}
  • model_id (string, required): Classifier model ID.
  • text (string, required): Input text to classify.
  • top_k (int): Number of top predictions to return (default: 1).

Response JSON schema:

{
"label": "purchase",
"probability": 0.92,
"top": [
{ "label": "purchase", "probability": 0.92 },
{ "label": "inquiry", "probability": 0.05 },
{ "label": "combat", "probability": 0.03 }
]
}
  • label (string): Predicted class label with highest probability.

  • probability (float): Confidence score for the top prediction.

  • top (array): Array of top-k predictions, each with label and probability.

  • requestJson: JSON string containing the classification request (see Remarks for schema).

Returns: JSON string containing the classification result with predicted label and confidence, or null on error (see Remarks for schema).

AtelicoEmbeddings

Embedding subsystem for generating text embeddings and computing semantic similarity. Access via AtelicoEngine.Embeddings.

Methods

string Embed(string requestJson)

Generate embedding vectors for one or more input texts (blocking). Follows the OpenAI Embeddings API format.

Request JSON schema:

{
"model": "in-memory::sentence-transformers/all-MiniLM-L6-v2",
"input": ["The knight draws his sword.", "A warrior unsheathes a blade."]
}
  • model (string, required): Embedding model ID in "backend::org/model" format.
  • input (string or array of strings, required): Text(s) to embed.

Response JSON schema:

{
"data": [
{ "embedding": [0.012, -0.034, ...], "index": 0 },
{ "embedding": [0.011, -0.033, ...], "index": 1 }
],
"model": "in-memory::sentence-transformers/all-MiniLM-L6-v2",
"usage": { "prompt_tokens": 14, "total_tokens": 14 }
}

Each element in data contains the float embedding vector and its index matching the input order.

  • requestJson: JSON string containing the embedding request (see Remarks for schema).

Returns: JSON string containing embedding vectors for each input text, or null on error (see Remarks for schema).

float Similarity(string modelId, string textA, string textB)

Compute cosine similarity between two texts using the specified embedding model. Embeds both texts and computes their similarity in a single call.

  • modelId: Embedding model ID to use, in "backend::org/model" format.

  • textA: First text to compare.

  • textB: Second text to compare.

Returns: Cosine similarity score (float) in the range -1.0 to 1.0, where 1.0 means identical meaning and -1.0 means opposite.

AtelicoGuardrails

Guardrails subsystem for content safety checking of user inputs, model outputs, and image generation prompts. Access via AtelicoEngine.Guardrails.

Methods

string CheckInput(string text)

Check user input text against configured safety guardrails before sending to a model.

Response JSON schema:

{
"action": "allow",
"checker_name": "content-safety",
"score": 0.01,
"reason": "Content is safe."
}
  • action (string): "allow", "block", or "warn".

  • checker_name (string): Name of the guardrail checker that triggered.

  • score (float, optional): Confidence score from the checker.

  • reason (string, optional): Human-readable reason for the verdict.

  • text: The user input text to check.

Returns: JSON SafetyVerdict string, or null on error (see Remarks for schema).

string CheckOutput(string text)

Check model output text against configured safety guardrails before displaying to the user.

Response JSON schema:

{
"action": "allow",
"checker_name": "content-safety",
"score": 0.02,
"reason": "Content is safe."
}
  • action (string): "allow", "block", or "warn".

  • checker_name (string): Name of the guardrail checker that triggered.

  • score (float, optional): Confidence score from the checker.

  • reason (string, optional): Human-readable reason for the verdict.

  • text: The model output text to check.

Returns: JSON SafetyVerdict string, or null on error (same schema as CheckInput).

string CheckImagePrompt(string prompt)

Check an image generation prompt against safety guardrails before generating.

Response JSON schema:

{
"action": "allow",
"checker_name": "image-safety",
"score": 0.01,
"reason": "Content is safe."
}
  • action (string): "allow", "block", or "warn".

  • checker_name (string): Name of the guardrail checker that triggered.

  • score (float, optional): Confidence score from the checker.

  • reason (string, optional): Human-readable reason for the verdict.

  • prompt: The image generation prompt text to check.

Returns: JSON SafetyVerdict string, or null on error (same schema as CheckInput).

AtelicoImages

Image generation subsystem providing text-to-image generation and background removal. Access via AtelicoEngine.Images.

Methods

string Generate(string requestJson)

Generate an image from a text prompt synchronously (blocking). Follows the OpenAI Images API format.

Request JSON schema:

{
"model": "in-memory::PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
"prompt": "A medieval castle at sunset, fantasy art style",
"size": "512x512",
"n": 1,
"response_format": "b64_json"
}
  • model (string, required): Image model ID in "backend::org/model" format.
  • prompt (string, required): Text description of the desired image.
  • size (string): Image dimensions as "WxH" (e.g., "512x512", "1024x1024"). Default: "512x512".
  • n (int): Number of images to generate (default: 1).
  • response_format (string): "b64_json" (default) or "url".

Response JSON schema:

{
"created": 1700000000,
"data": [
{ "b64_json": "iVBORw0KGgoAAAANSUhEUg..." }
]
}

Each element in data contains a base64-encoded PNG image.

  • requestJson: JSON string containing the image generation request (see Remarks for schema).

Returns: JSON string containing the generated image data as base64, or null on error (see Remarks for schema).

string RemoveBackground(string requestJson)

Remove the background from an image synchronously (blocking). Returns a PNG with transparent background.

Request JSON schema:

{
"image": "iVBORw0KGgoAAAANSUhEUg...",
"model": "optional-model-id"
}
  • image (string, required): Base64-encoded input image (PNG or JPEG).
  • model (string): Background removal model ID (uses default if omitted).

Response JSON schema:

{
"b64_json": "iVBORw0KGgoAAAANSUhEUg..."
}

The b64_json field contains a base64-encoded PNG with transparent background.

  • requestJson: JSON string containing the base64-encoded input image (see Remarks for schema).

Returns: JSON string containing the base64-encoded result image with transparent background, or null on error.

AtelicoKvStore

Key-value store subsystem with embedding-backed semantic queries. Stores text entries with automatic embedding generation, enabling semantic search over game data (e.g., NPC dialogue, lore, inventory). Access via AtelicoEngine.KvStore.

Methods

ulong Create(string configJson)

Create a new KV store backed by an embedding model for semantic search over game data.

Configuration JSON schema:

{
"embedding_model": "in-memory::sentence-transformers/all-MiniLM-L6-v2",
"store_id": "npc-dialogue"
}
  • embedding_model (string, required): Model ID for embedding entries.

  • store_id (string): Optional custom store identifier; auto-generated if omitted.

  • configJson: JSON string containing the store configuration (see Remarks for schema).

Returns: Store ID handle (ulong) for use with other KvStore methods, or 0 on failure.

bool Insert(string storeId, string entriesJson)

Insert one or more entries into a KV store. Each entry is automatically embedded for semantic search.

Entries JSON schema:

[
{
"key": "greeting-1",
"value": "Welcome, traveler!",
"metadata": { "npc": "innkeeper" }
},
{
"key": "greeting-2",
"value": "Halt! State your business.",
"metadata": { "npc": "guard" }
}
]
  • key (string, required): Unique key for the entry.

  • value (string, required): Text content to store and embed.

  • metadata (object, optional): Arbitrary metadata for filtering.

  • storeId: Store identifier returned by Create.

  • entriesJson: JSON array of entry objects to insert (see Remarks for schema).

Returns: True on success; false on error.

string Query(string storeId, string queryJson)

Query a KV store using semantic search to find entries similar to a query text. Results are ranked by cosine similarity to the query embedding.

Query JSON schema:

{
"query": "hello friend",
"top_k": 2,
"filter": "npc == 'innkeeper'"
}
  • query (string, required): Text to search for semantically.
  • top_k (int): Maximum number of results (default: 5).
  • filter (string, optional): Metadata filter expression.

Response JSON schema:

[
{
"key": "greeting-1",
"value": "Welcome, traveler!",
"score": 0.89,
"metadata": { "npc": "innkeeper" }
}
]

Each result contains the entry key, value, similarity score, and metadata.

  • storeId: Store identifier returned by Create.

  • queryJson: JSON string containing the query parameters (see Remarks for schema).

Returns: JSON array of matching entries sorted by descending similarity score, or null on error (see Remarks for response schema).

string Scan(string storeId, string filter, uint limit)

Scan entries in a KV store with an optional metadata filter. Unlike Query, this does not perform semantic search -- it returns entries matching the filter in insertion order.

Response JSON schema:

[
{
"key": "greeting-1",
"value": "Welcome, traveler!",
"metadata": { "npc": "innkeeper" }
}
]

Each entry contains its key, value, and metadata.

  • storeId: Store identifier returned by Create.

  • filter: Optional metadata filter expression, or null for no filter.

  • limit: Maximum number of entries to return.

Returns: JSON array of matching entries in insertion order, or null on error.

ulong Count(string storeId, string filter)

Count entries in a KV store matching an optional metadata filter.

  • storeId: Store identifier returned by Create.

  • filter: Optional metadata filter expression, or null to count all entries.

Returns: Number of matching entries (ulong), or 0 on error.

bool Destroy(string storeId)

Destroy a KV store and release all associated resources (embeddings, index, storage).

  • storeId: Store identifier returned by Create.

Returns: True if the store was found and destroyed; false otherwise.

AtelicoLlm

LLM subsystem providing chat completions, text completions, the Responses API, and templated LM function calls. Access via AtelicoEngine.Llm.

Methods

string ChatCompletion(string requestJson)

Synchronous (blocking) chat completion following the OpenAI Chat Completions API format. Blocks the calling thread until the full response is generated.

Request JSON schema:

{
"model": "in-memory::meta-llama/Llama-3.2-1B-Instruct-GGUF",
"messages": [
{ "role": "system", "content": "You are a helpful NPC." },
{ "role": "user", "content": "What quests are available?" }
],
"max_tokens": 256,
"temperature": 0.7,
"top_p": 1.0,
"response_format": { "type": "json_schema", "json_schema": { ... } }
}
  • model (string, required): Model ID in "backend::org/model" format.
  • messages (array, required): Conversation messages, each with role ("system", "user", or "assistant") and content (string).
  • max_tokens (int): Maximum tokens to generate (default: 256).
  • temperature (float): Sampling temperature, 0.0-2.0 (default: 0.7).
  • top_p (float): Nucleus sampling threshold (default: 1.0).
  • response_format (object): Optional structured output constraint.

Response JSON schema:

{
"id": "chatcmpl-abc123",
"choices": [
{
"message": { "role": "assistant", "content": "..." },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 25, "completion_tokens": 40, "total_tokens": 65 }
}
  • requestJson: JSON string containing the chat completion request (see Remarks for schema).

Returns: JSON string containing the chat completion response, or null on error (see Remarks for schema).

string TextCompletion(string requestJson)

Synchronous (blocking) text completion. Continues a raw prompt without chat formatting.

Request JSON schema:

{
"model": "in-memory::meta-llama/Llama-3.2-1B-Instruct-GGUF",
"prompt": "The dragon descended upon the village and",
"max_tokens": 100,
"temperature": 0.8
}
  • model (string, required): Model ID in "backend::org/model" format.
  • prompt (string, required): Text prompt to continue.
  • max_tokens (int): Maximum tokens to generate.
  • temperature (float): Sampling temperature, 0.0-2.0.

Response JSON schema:

{
"id": "cmpl-abc123",
"choices": [
{ "text": " breathed fire across the rooftops...", "finish_reason": "stop" }
],
"usage": { "prompt_tokens": 8, "completion_tokens": 12, "total_tokens": 20 }
}
  • requestJson: JSON string containing the completion request (see Remarks for schema).

Returns: JSON string containing the completion response, or null on error (see Remarks for schema).

string Respond(string requestJson)

Synchronous response request following the OpenAI Responses API format.

Request JSON schema:

{
"model": "in-memory::meta-llama/Llama-3.2-1B-Instruct-GGUF",
"input": "What is the capital of France?",
"instructions": "Answer concisely.",
"max_output_tokens": 50,
"temperature": 0.7
}
  • model (string, required): Model ID in "backend::org/model" format.
  • input (string or array): User input text or message array.
  • instructions (string): System instructions.
  • max_output_tokens (int): Maximum tokens to generate.
  • temperature (float): Sampling temperature, 0.0-2.0.

Response JSON schema:

{
"id": "resp-abc123",
"output": [
{ "type": "message", "content": "The capital of France is Paris." }
],
"usage": { "prompt_tokens": 12, "completion_tokens": 8, "total_tokens": 20 }
}
  • requestJson: JSON string containing the response request (see Remarks for schema).

Returns: JSON string containing the response object, or null on error (see Remarks for schema).

AtelicoLora

LoRA (Low-Rank Adaptation) adapter management subsystem. Allows loading, unloading, and scaling LoRA adapters on top of loaded base models to customize model behavior without full fine-tuning. Access via AtelicoEngine.Lora.

Methods

bool Load(string modelId, string adapterPath)

Load a LoRA adapter onto a base model. The base model must already be loaded via AtelicoEngine.LoadModel. Only one adapter can be active per model.

  • modelId: Base model ID that the adapter will be applied to, in "backend::org/model" format.

  • adapterPath: File path or HuggingFace model ID for the LoRA adapter weights (e.g., "path/to/adapter" or "org/adapter-name").

Returns: True if the adapter loaded successfully; false on error (e.g., base model not loaded, invalid adapter).

bool Unload(string modelId)

Unload the active LoRA adapter from a model, reverting to base model behavior.

  • modelId: Model ID to remove the adapter from.

Returns: True if an adapter was found and unloaded; false if none was active or on error.

bool SetScale(string modelId, float scale)

Set the runtime scale (alpha) for the active LoRA adapter on a model. A scale of 1.0 applies the adapter at full strength; 0.0 effectively disables it without unloading. Values between 0.0 and 1.0 blend between base and adapted behavior.

  • modelId: Model ID with an active adapter.

  • scale: Adapter strength multiplier (float), typically in the range 0.0 to 1.0.

Returns: True on success; false if no adapter is loaded or on error.