Skip to main content
Version: 0.7

Unreal Engine API Reference

UAtelicoAISettings : UDeveloperSettings

Configuration for the Atelico AI Engine. Appears in Project Settings under "Atelico AI". These settings are read at engine initialization and used to configure the native backend. Some settings (like SchedulingMode) can also be changed at runtime via UAtelicoAISubsystem methods.

Properties

PropertyTypeDescription
ModelDirectoryFStringCustom directory path for model file storage and caching. Leave empty to use the platform default cache directory. When set, models are downloaded to and loaded from this directory instead.

UAtelicoAISubsystem : UGameInstanceSubsystem

Main Atelico AI subsystem. Provides access to all engine capabilities including LLM chat completions, image generation, embeddings, classifiers, guardrails, LoRA adapters, key-value stores, and ANN vector search.

Access via: GetGameInstance()->GetSubsystem<UAtelicoAISubsystem>()

Persists across level transitions. Automatically initialized and shut down with the game instance. Configuration is loaded from UAtelicoAISettings (Project Settings > Atelico AI).

Properties

PropertyTypeDescription
OnTokenReceivedFOnTokenReceivedBroadcast each frame during streaming chat when a new token is generated. Bind to this delegate to update UI text incrementally as the model generates. Only fires between ChatCompletionStream and OnChatCompleted/OnChatFailed.
OnChatCompletedFOnChatCompletedBroadcast when a streaming chat completion finishes successfully. The payload is the full ChatCompletionResponse JSON string. Always preceded by zero or more OnTokenReceived calls.
OnChatFailedFOnChatFailedBroadcast when a streaming chat completion fails with an error. The payload is a human-readable error message. After this fires, no further OnTokenReceived calls will occur for the failed stream.
OnImageGeneratedFOnImageGeneratedBroadcast when an asynchronous image generation completes. The payload is the ImageGenerationResponse JSON string containing base64-encoded PNG image data.

Methods

static bool IsCigD3D12Supported(int32 DeviceIndex = 0)

Query whether the GPU supports Compute-in-Graphics (CiG) with D3D12. CiG allows inference and rendering to share GPU hardware scheduling, reducing context-switch overhead. Requires NVIDIA R570+ driver, CUDA 12.8+, and Ada Lovelace+ GPU.

  • DeviceIndex: GPU device index to query (default: 0, the primary GPU).

Returns: true if CiG with D3D12 is supported on the specified device; false otherwise.

static bool IsCigVulkanSupported(int32 DeviceIndex = 0)

Query whether the GPU supports Compute-in-Graphics (CiG) with Vulkan. CiG allows inference and rendering to share GPU hardware scheduling, reducing context-switch overhead. Requires NVIDIA R570+ driver, CUDA 12.9+, and Ada Lovelace+ GPU.

  • DeviceIndex: GPU device index to query (default: 0, the primary GPU).

Returns: true if CiG with Vulkan is supported on the specified device; false otherwise.

FString ChatCompletion(const FString& RequestJson) [BlueprintCallable]

Synchronous chat completion (blocks the game thread until generation completes).

  • RequestJson: ChatCompletionRequest JSON string (OpenAI format) with fields: model (required) - Model ID in "backend::org/model" format, messages (required) - array of role/content message objects, max_tokens - maximum tokens to generate (default: 256), temperature - sampling temperature 0.0-2.0 (default: 0.7), top_p - nucleus sampling threshold (default: 1.0), response_format - optional JSON schema constraint for structured output.

Returns: ChatCompletionResponse JSON string with id, choices (array of message and finish_reason), and usage (token counts). Returns empty string on error.

Example input:

{"model":"in-memory::meta-llama/Llama-3.2-1B-Instruct-GGUF","messages":[{"role":"user","content":"Hello"}],"max_tokens":100}

Example output:

{"id":"chatcmpl-abc123","choices":[{"message":{"role":"assistant","content":"Hi! How can I help?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":5,"completion_tokens":8,"total_tokens":13}}

bool ChatCompletionStream(const FString& RequestJson) [BlueprintCallable]

Start a streaming chat completion. Tokens arrive via OnTokenReceived each frame. When complete, OnChatCompleted fires with the full response. On failure, OnChatFailed fires with an error message. The stream field is set automatically.

  • RequestJson: ChatCompletionRequest JSON string (same schema as ChatCompletion).

Returns: true if the stream started successfully; false on immediate failure (e.g., model not loaded, malformed JSON).

Example input:

{"model":"in-memory::meta-llama/Llama-3.2-1B-Instruct-GGUF","messages":[{"role":"user","content":"Tell me a story."}],"max_tokens":512}

FString TextCompletion(const FString& RequestJson) [BlueprintCallable]

Synchronous text completion (blocks the game thread). Continues a raw text prompt without chat formatting or role-based message structure.

  • RequestJson: CompletionRequest JSON string with fields: model (required) - Model ID in "backend::org/model" format, prompt (required) - text prompt to continue, max_tokens - maximum tokens to generate, temperature - sampling temperature 0.0-2.0.

Returns: CompletionResponse JSON string with id, choices (array of text and finish_reason), and usage (token counts). Returns empty string on error.

Example input:

{"model":"in-memory::meta-llama/Llama-3.2-1B-Instruct-GGUF","prompt":"The dragon descended upon the village and","max_tokens":100}

Example output:

{"id":"cmpl-abc123","choices":[{"text":" breathed fire across the rooftops...","finish_reason":"stop"}],"usage":{"prompt_tokens":8,"completion_tokens":12,"total_tokens":20}}

FString Respond(const FString& RequestJson) [BlueprintCallable]

Synchronous response request following the OpenAI Responses API format. Unlike ChatCompletion, this accepts a flat input string or message array and returns structured output items.

  • RequestJson: ResponseRequest JSON string with fields: model (required) - Model ID in "backend::org/model" format, input - user input text or message array, instructions - system instructions, max_output_tokens - maximum tokens to generate, temperature - sampling temperature 0.0-2.0.

Returns: Response JSON string with id, output (array of typed output items), and usage (token counts). Returns empty string on error.

Example input:

{"model":"in-memory::meta-llama/Llama-3.2-1B-Instruct-GGUF","input":"What is the capital of France?","instructions":"Answer concisely.","max_output_tokens":50}

Example output:

{"id":"resp-abc123","output":[{"type":"message","content":"The capital of France is Paris."}],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20}}

FString GenerateImage(const FString& RequestJson) [BlueprintCallable]

Generate an image from a text prompt (blocking). Returns base64-encoded PNG data.

  • RequestJson: ImageGenerationRequest JSON string (OpenAI Images API format) with fields: model (required) - image model ID in "backend::org/model" format, prompt (required) - text description of the desired image, size - image dimensions as "WxH" (default: "512x512"), n - number of images to generate (default: 1), response_format - "b64_json" (default) or "url".

Returns: ImageGenerationResponse JSON string with created (Unix timestamp) and data (array of base64-encoded PNG objects). Returns empty string on error.

Example input:

{"model":"in-memory::PixArt-alpha/PixArt-Sigma-XL-2-1024-MS","prompt":"A medieval castle at sunset","size":"512x512","n":1}

Example output:

{"created":1700000000,"data":[{"b64_json":"iVBORw0KGgoAAAANSUhEUg..."}]}

FString RemoveBackground(const FString& RequestJson) [BlueprintCallable]

Remove the background from an image (blocking). Takes a base64-encoded image and returns a PNG with a transparent background.

  • RequestJson: JSON string with fields: image (required) - base64-encoded input image (PNG or JPEG), model - background removal model ID (uses default if omitted).

Returns: JSON string with b64_json field containing the base64-encoded PNG with transparent background. Returns empty string on error.

Example input:

{"image":"iVBORw0KGgoAAAANSUhEUg..."}

Example output:

{"b64_json":"iVBORw0KGgoAAAANSUhEUg..."}

FString Embed(const FString& RequestJson) [BlueprintCallable]

Generate embedding vectors for one or more input texts (blocking). Embeddings are dense float vectors representing the semantic meaning of text, useful for similarity search, clustering, and classification.

  • RequestJson: EmbeddingRequest JSON string (OpenAI Embeddings API format) with fields: model (required) - embedding model ID in "backend::org/model" format, input (required) - single text string or array of texts to embed.

Returns: EmbeddingResponse JSON string with data (array of embedding float arrays and index values), model, and usage. Returns empty string on error.

Example input:

{"model":"in-memory::sentence-transformers/all-MiniLM-L6-v2","input":["The knight draws his sword."]}

Example output:

{"data":[{"embedding":[0.012,-0.034,0.056],"index":0}],"model":"in-memory::sentence-transformers/all-MiniLM-L6-v2","usage":{"prompt_tokens":7,"total_tokens":7}}

float Similarity(const FString& ModelId, const FString& TextA, const FString& TextB) [BlueprintCallable]

Compute cosine similarity between two texts using the specified embedding model. Both texts are embedded and their cosine distance is computed. Useful for quick semantic comparisons without managing raw embedding vectors.

  • ModelId: Embedding model ID in "backend::org/model" format.
  • TextA: First text to compare.
  • TextB: Second text to compare.

Returns: Cosine similarity score in the range -1.0 to 1.0, where 1.0 means identical semantic meaning and -1.0 means opposite. Returns -2.0 on error (e.g., model not loaded).

FString ClassifierPredict(const FString& RequestJson) [BlueprintCallable]

Predict the class of input text using a loaded classifier model (blocking). Returns the top prediction along with optional top-k ranked alternatives.

  • RequestJson: Classification request JSON string with fields: model_id (required) - classifier model ID, text (required) - input text to classify, top_k - number of top predictions to return (default: 1).

Returns: Classification response JSON string with label (top predicted class), probability (confidence score), and top (array of ranked label/probability pairs). Returns empty string on error.

Example input:

{"model_id":"intent-classifier","text":"I want to buy a health potion","top_k":3}

Example output:

{"label":"purchase","probability":0.92,"top":[{"label":"purchase","probability":0.92},{"label":"inquiry","probability":0.05},{"label":"combat","probability":0.03}]}

FString CheckInput(const FString& Text) [BlueprintCallable]

Check input text against configured safety guardrails before sending to a model. Use this to filter player input before passing it to ChatCompletion or similar methods. Guardrails must be enabled in UAtelicoAISettings for this to return meaningful results.

  • Text: The user input text to check against safety rules.

Returns: SafetyVerdict JSON string with action ("allow", "block", or "warn"), checker_name (which guardrail triggered), and optional score and reason fields. Returns empty string on error.

Example output:

{"action":"allow","checker_name":"content-safety","score":0.01}

FString CheckOutput(const FString& Text) [BlueprintCallable]

Check model output text against configured safety guardrails before displaying to the player. Use this to filter AI-generated responses before showing them in the game UI. Guardrails must be enabled in UAtelicoAISettings for this to return meaningful results.

  • Text: The model output text to check against safety rules.

Returns: SafetyVerdict JSON string with action ("allow", "block", or "warn"), checker_name (which guardrail triggered), and optional score and reason fields. Returns empty string on error.

Example output:

{"action":"allow","checker_name":"content-safety","score":0.02}

FString CheckImagePrompt(const FString& Prompt) [BlueprintCallable]

Check an image generation prompt against safety guardrails before generating. Use this to filter prompts before passing them to GenerateImage. Guardrails must be enabled in UAtelicoAISettings for this to return meaningful results.

  • Prompt: The image generation prompt text to check against safety rules.

Returns: SafetyVerdict JSON string with action ("allow", "block", or "warn"), checker_name (which guardrail triggered), and optional score and reason fields. Returns empty string on error.

Example output:

{"action":"allow","checker_name":"image-safety","score":0.01}

bool LoadAdapter(const FString& ModelId, const FString& AdapterPath) [BlueprintCallable]

Load a LoRA adapter onto a base model. The base model must already be loaded via LoadModel. Only one adapter can be active per model at a time; loading a new adapter replaces the previous one.

  • ModelId: Base model ID that the adapter will be applied to, in "backend::org/model" format.
  • AdapterPath: File path or HuggingFace model ID for the LoRA adapter weights.

Returns: true if the adapter loaded successfully; false on error (e.g., base model not loaded, adapter weights incompatible).

bool UnloadAdapter(const FString& ModelId) [BlueprintCallable]

Unload the active LoRA adapter from a model, reverting to base model behavior. This frees the adapter weights from memory while keeping the base model loaded.

  • ModelId: Model ID to remove the adapter from.

Returns: true if an adapter was found and unloaded; false if no adapter was active or the model is not loaded.

bool SetAdapterScale(const FString& ModelId, float Scale) [BlueprintCallable]

Set the runtime scale (alpha) for the active LoRA adapter on a model. A scale of 1.0 applies the adapter at full strength; 0.0 effectively disables it without unloading. Values between 0.0 and 1.0 blend base and adapted behavior.

  • ModelId: Model ID with an active LoRA adapter.
  • Scale: Adapter strength multiplier, typically in 0.0 to 1.0.

Returns: true on success; false if no adapter is loaded on the model or on error.

bool LoadModel(const FString& ModelId) [BlueprintCallable]

Pre-load a model synchronously (blocks until the model is fully loaded and ready). The model will be downloaded from HuggingFace Hub if not already cached locally. Once loaded, the model remains in memory until explicitly unloaded or the subsystem shuts down.

  • ModelId: Model identifier, e.g. "in-memory::meta-llama/Llama-3.2-1B-Instruct-GGUF".

Returns: true if the model loaded successfully; false on error (e.g., invalid model ID, download failure, insufficient VRAM).

bool UnloadModel(const FString& ModelId) [BlueprintCallable]

Unload a model, freeing its GPU and system memory. Any in-flight inference using this model will be cancelled.

  • ModelId: Model identifier previously passed to LoadModel.

Returns: true if the model was found and unloaded; false if the model was not loaded.

bool IsModelLoaded(const FString& ModelId)

Check if a model is currently loaded and ready for inference.

  • ModelId: Model identifier to check.

Returns: true if the model is loaded and ready; false otherwise.

FString ListModels() [BlueprintCallable]

List all currently loaded models with their types and memory usage.

Returns: JSON array of model descriptors, each containing id (model identifier), type ("llm", "embedding", "image", or "classifier"), and size_bytes (approximate memory usage). Returns "[]" if no models are loaded.

Example output:

[{"id":"in-memory::meta-llama/Llama-3.2-1B-Instruct-GGUF","type":"llm","size_bytes":1234567890}]

bool CreateKvStore(const FString& ConfigJson) [BlueprintCallable]

Create a new embedding-backed key-value store for semantic search over game data. Entries inserted into the store are automatically embedded using the specified model, enabling natural-language queries against structured game content.

  • ConfigJson: Store configuration JSON string with fields: embedding_model (required) - model ID for embedding entries, store_id - custom store identifier (auto-generated if omitted).

Returns: true if the store was created successfully; false on error (e.g., embedding model not loaded).

Example input:

{"embedding_model":"in-memory::sentence-transformers/all-MiniLM-L6-v2","store_id":"npc-dialogue"}

bool KvStoreInsert(const FString& StoreId, const FString& EntriesJson) [BlueprintCallable]

Insert entries into a KV store. Each entry is automatically embedded using the store's configured embedding model for later semantic search via KvStoreQuery.

  • StoreId: Store identifier from CreateKvStore.
  • EntriesJson: JSON array of entry objects, each with: key (required) - unique key for the entry, value (required) - text content to store and embed, metadata - optional arbitrary metadata object for filtering.

Returns: true on success; false on error (e.g., store not found, embedding failure).

Example input:

[{"key":"greeting-1","value":"Welcome, traveler!","metadata":{"npc":"innkeeper"}}]

FString KvStoreQuery(const FString& StoreId, const FString& QueryJson) [BlueprintCallable]

Query a KV store using semantic search. The query text is embedded and compared against all stored entries, returning the most semantically similar results.

  • StoreId: Store identifier from CreateKvStore.
  • QueryJson: Query parameters JSON string with fields: query (required) - text to search for semantically, top_k - maximum number of results (default: 5), filter - optional metadata filter expression.

Returns: JSON array of results sorted by descending similarity, each containing key, value, score (similarity), and metadata. Returns empty string on error.

Example input:

{"query":"hello friend","top_k":2}

Example output:

[{"key":"greeting-1","value":"Welcome, traveler!","score":0.89,"metadata":{"npc":"innkeeper"}}]

FString KvStoreScan(const FString& StoreId, const FString& Filter, int32 Limit) [BlueprintCallable]

Scan entries in a KV store with an optional metadata filter (no semantic search). Returns entries in insertion order, useful for iterating over store contents.

  • StoreId: Store identifier from CreateKvStore.
  • Filter: Metadata filter expression, or empty string for no filter.
  • Limit: Maximum number of entries to return.

Returns: JSON array of entries in insertion order, each containing key, value, and metadata. Returns empty string on error.

Example output:

[{"key":"greeting-1","value":"Welcome, traveler!","metadata":{"npc":"innkeeper"}}]

bool DestroyKvStore(const FString& StoreId) [BlueprintCallable]

Destroy a KV store and release all associated resources, including embedded vectors and stored entries. The store ID cannot be reused until a new store is created with it.

  • StoreId: Store identifier from CreateKvStore.

Returns: true if the store was found and destroyed; false if no store exists with that ID.

int64 CreateAnnIndex(const FString& ConfigJson) [BlueprintCallable]

Create an Approximate Nearest Neighbor (ANN) index backed by HNSW for fast vector similarity search. Use this for custom embedding workflows where you manage raw vectors directly instead of using the higher-level KV store.

  • ConfigJson: Index configuration JSON string with fields: dim (required) - vector dimensionality (must match inserted vectors), max_elements (required) - maximum number of vectors the index can hold, m (default: 16) - HNSW connections per node (higher = better recall, more memory), ef_construction (default: 200) - build-time search width, ef_search (default: 50) - query-time search width.

Returns: Index handle ID (positive int64) on success, or 0 on failure.

Example input:

{"dim":384,"max_elements":10000,"m":16,"ef_construction":200}

bool BuildAnnIndex(int64 IndexId) [BlueprintCallable]

Build the ANN index graph. Must be called after all insertions and before any searches. This operation is O(n * log(n)) and may take noticeable time for large indices.

  • IndexId: Index handle from CreateAnnIndex.

Returns: true if the graph was built successfully; false on error (e.g., invalid handle, no vectors inserted).

FString SearchAnnIndex(int64 IndexId, const TArray<float>& QueryVector, int32 K) [BlueprintCallable]

Search for the k nearest neighbors in the ANN index. The index must have been built via BuildAnnIndex before searching.

  • IndexId: Index handle from CreateAnnIndex.
  • QueryVector: Query vector with length matching the index's configured dim.
  • K: Number of nearest neighbors to return.

Returns: JSON array of results sorted by ascending distance, each containing label_id (application-defined label from insertion) and distance (lower is closer). Returns "[]" on error or if the index is empty.

Example output:

[{"label_id":42,"distance":0.05},{"label_id":17,"distance":0.12},{"label_id":99,"distance":0.31}]

bool DestroyAnnIndex(int64 IndexId) [BlueprintCallable]

Destroy an ANN index and free all associated memory, including stored vectors and the HNSW graph structure.

  • IndexId: Index handle from CreateAnnIndex.

Returns: true if the index was found and destroyed; false if no index exists with that handle.

void SetSchedulingMode(EAtelicoSchedulingMode Mode) [BlueprintCallable]

Set the GPU scheduling mode to control how rendering and inference share GPU time. Takes effect immediately for subsequent inference operations. Can also be configured statically via UAtelicoAISettings::SchedulingMode in Project Settings.

  • Mode: The desired scheduling priority balance: PrioritizeCompute (minimize inference latency), Balance (default, share GPU time evenly), or PrioritizeGraphics (maximize rendering FPS).

void SetVramBudgetMb(int32 Mb) [BlueprintCallable]

Set the maximum VRAM budget in megabytes for AI model storage. The engine will refuse to load models that would exceed this limit. Use 0 for unlimited (uses all available VRAM as needed).

  • Mb: VRAM budget in megabytes. Use 0 for unlimited.

void SetTargetTps(int32 Tps) [BlueprintCallable]

Set the target tokens-per-second rate for inference pacing. The engine will yield GPU time between tokens to stay near this rate, freeing GPU cycles for rendering. Useful for dialogue systems where tokens should appear at readable speed rather than as fast as possible.

  • Tps: Target tokens per second. Use 0 for unlimited (fastest possible).