Version: 0.9

Structured Generation

Structured generation constrains the model's output to match a format you define. The output is guaranteed to be valid -- no retries, no hoping the model formats things correctly.

Supported formats:

JSON Schema -- structured objects with typed fields, enums, nested objects, and arrays
Choice -- pick exactly one from a list of options (e.g., emotions, actions, item types)
Regex -- match a pattern (e.g., dates, codes, identifiers)
Grammar -- match a Lark context-free grammar (e.g., custom DSLs, complex formats)

This is essential for game development where AI output needs to be consumed by code.

How It Works

Add a response_format field to your chat completion request with a JSON Schema:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
    "messages": [
      {"role": "user", "content": "Generate a random fantasy weapon"}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "Weapon",
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "type": {"type": "string", "enum": ["sword", "axe", "bow", "staff", "dagger"]},
            "damage": {"type": "integer", "minimum": 1, "maximum": 100},
            "rarity": {"type": "string", "enum": ["common", "uncommon", "rare", "legendary"]},
            "description": {"type": "string"}
          },
          "required": ["name", "type", "damage", "rarity", "description"]
        },
        "strict": true
      }
    }
  }'

The content in the response is always valid JSON matching your schema. Parse it directly in your game code.

Schema Format

The response_format object has this structure:

{
  "type": "json_schema",
  "json_schema": {
    "name": "SchemaName",
    "schema": { ... },
    "strict": true,
    "schema_injection": "concise"
  }
}

Field	Type	Default	Description
`type`	string		Must be `"json_schema"`
`json_schema.name`	string		A name for the schema (used internally)
`json_schema.schema`	object		Standard JSON Schema object
`json_schema.strict`	boolean	`false`	Enforce strict adherence (recommended: `true`)
`json_schema.schema_injection`	string	`"concise"`	Controls how the schema is described in the prompt (see below)

Schema Injection

When you use structured generation, the engine automatically describes the JSON schema in the system message so the model knows what to produce. This is critical for smaller models -- without it, the model may generate degenerate output (whitespace loops) because it doesn't know it should output JSON.

The schema_injection field controls the verbosity of this description:

Level	Description	Token Cost
`"none"`	No injection. Use when your prompt already contains JSON instructions.	0
`"light"`	Field names and types only. Compact one-liner.	Low
`"concise"`	Field names, types, and descriptions from the schema. Default.	Medium
`"full"`	Injects the complete JSON schema verbatim.	High

You don't need to include "respond as JSON" in your prompts -- the engine handles this automatically. If you're migrating from a setup where you manually included JSON instructions in your prompts, you can either remove them (recommended) or set schema_injection to "none" to avoid redundancy.

OpenAI clients that don't send schema_injection get the default "concise" behavior automatically. The field is fully backward-compatible.

Example: NPC Dialogue with Emotion Tags

Generate NPC dialogue with emotion metadata for driving animations:

Python
Godot (GDScript)
Unity (C#)
Unreal (C++)

import json
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
    messages=[
        {"role": "system", "content": "You are Greta, a grumpy but kind-hearted blacksmith."},
        {"role": "user", "content": "I brought you flowers!"},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "NPCDialogue",
            "schema": {
                "type": "object",
                "properties": {
                    "text": {"type": "string"},
                    "emotion": {"type": "string", "enum": ["happy", "sad", "angry", "surprised", "neutral", "embarrassed"]},
                    "gesture": {"type": "string", "enum": ["wave", "nod", "shrug", "point", "cross_arms", "none"]},
                },
                "required": ["text", "emotion", "gesture"],
            },
            "strict": True,
        },
    },
)

dialogue = json.loads(response.choices[0].message.content)
print(f"[{dialogue['emotion']}] {dialogue['text']}")
# e.g. [embarrassed] Flowers?! I... well, put them over there, I suppose.

var dialogue_schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "NPCDialogue",
        "schema": {
            "type": "object",
            "properties": {
                "text": {"type": "string"},
                "emotion": {"type": "string", "enum": ["happy", "sad", "angry", "surprised", "neutral", "embarrassed"]},
                "gesture": {"type": "string", "enum": ["wave", "nod", "shrug", "point", "cross_arms", "none"]}
            },
            "required": ["text", "emotion", "gesture"]
        },
        "strict": true
    }
}

func talk_to_greta(player_input: String) -> void:
    var request = {
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": [
            {"role": "system", "content": "You are Greta, a grumpy but kind-hearted blacksmith."},
            {"role": "user", "content": player_input}
        ],
        "response_format": dialogue_schema
    }
    engine.async_chat_completions(JSON.stringify(request))

func _on_async_request_completed(_job_id: int, response: String) -> void:
    var parsed = JSON.parse_string(response)
    var dialogue = JSON.parse_string(parsed["choices"][0]["message"]["content"])
    # Use emotion to set animation state
    npc_sprite.play(dialogue["emotion"])
    # Use gesture to trigger animation
    if dialogue["gesture"] != "none":
        npc_sprite.play_gesture(dialogue["gesture"])
    dialogue_label.text = dialogue["text"]

[System.Serializable]
public class NPCDialogue
{
    public string text;
    public string emotion;  // "happy", "sad", "angry", "surprised", "neutral", "embarrassed"
    public string gesture;  // "wave", "nod", "shrug", "point", "cross_arms", "none"
}

public async Task<NPCDialogue> TalkToGreta(string playerInput)
{
    var request = new
    {
        model = "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        messages = new object[]
        {
            new { role = "system", content = "You are Greta, a grumpy but kind-hearted blacksmith." },
            new { role = "user", content = playerInput }
        },
        response_format = new
        {
            type = "json_schema",
            json_schema = new
            {
                name = "NPCDialogue",
                schema = new
                {
                    type = "object",
                    properties = new
                    {
                        text = new { type = "string" },
                        emotion = new { type = "string", @enum = new[] { "happy", "sad", "angry", "surprised", "neutral", "embarrassed" } },
                        gesture = new { type = "string", @enum = new[] { "wave", "nod", "shrug", "point", "cross_arms", "none" } }
                    },
                    required = new[] { "text", "emotion", "gesture" }
                },
                strict = true
            }
        }
    };

    var json = JsonSerializer.Serialize(request);
    var content = new StringContent(json, Encoding.UTF8, "application/json");
    var response = await client.PostAsync($"{BaseUrl}/chat/completions", content);
    var responseJson = await response.Content.ReadAsStringAsync();

    using var doc = JsonDocument.Parse(responseJson);
    var dialogueJson = doc.RootElement.GetProperty("choices")[0]
        .GetProperty("message").GetProperty("content").GetString();

    return JsonSerializer.Deserialize<NPCDialogue>(dialogueJson);
    // dialogue.emotion -> set animator state
    // dialogue.gesture -> trigger animation clip
    // dialogue.text -> display in dialogue UI
}

// Define the schema as a JSON object
TSharedPtr<FJsonObject> BuildDialogueSchema()
{
    // Build the response_format object with json_schema
    // containing text, emotion (enum), and gesture (enum) fields
    // ... (same pattern as other JSON building examples) ...
}

void UAtelicoClient::TalkToGreta(const FString& PlayerInput)
{
    TSharedPtr<FJsonObject> Body = MakeShareable(new FJsonObject);
    Body->SetStringField("model", "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M");
    // ... add messages ...
    Body->SetObjectField("response_format", BuildDialogueSchema());

    SendRequestWithCallback(Body, [this](TSharedPtr<FJsonObject> Response)
    {
        FString ContentJson = Response->GetArrayField("choices")[0]
            ->AsObject()->GetObjectField("message")->GetStringField("content");

        TSharedPtr<FJsonObject> Dialogue;
        auto Reader = TJsonReaderFactory<>::Create(ContentJson);
        FJsonSerializer::Deserialize(Reader, Dialogue);

        FString Text = Dialogue->GetStringField("text");
        FString Emotion = Dialogue->GetStringField("emotion");
        FString Gesture = Dialogue->GetStringField("gesture");

        // Drive animation: SetAnimState(Emotion), PlayGesture(Gesture)
        // Display text in dialogue widget
    });
}

Your game reads emotion to set the character's facial expression and gesture to trigger an animation.

Example: Quest Generation

Python
Godot (GDScript)
Unity (C#)
Unreal (C++)

response = client.chat.completions.create(
    model="in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
    messages=[
        {"role": "user", "content": "Create a side quest for a coastal fishing village troubled by sea creatures"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "Quest",
            "schema": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "giver_npc": {"type": "string"},
                    "description": {"type": "string"},
                    "objectives": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "description": {"type": "string"},
                                "type": {"type": "string", "enum": ["kill", "collect", "talk", "explore", "escort"]},
                                "target": {"type": "string"},
                                "count": {"type": "integer", "minimum": 1},
                            },
                            "required": ["description", "type", "target", "count"],
                        },
                    },
                    "reward_gold": {"type": "integer", "minimum": 0},
                    "reward_xp": {"type": "integer", "minimum": 0},
                },
                "required": ["title", "giver_npc", "description", "objectives", "reward_gold", "reward_xp"],
            },
            "strict": True,
        },
    },
)

quest = json.loads(response.choices[0].message.content)
print(f"Quest: {quest['title']} (from {quest['giver_npc']})")
for obj in quest["objectives"]:
    print(f"  [{obj['type']}] {obj['description']} ({obj['count']}x {obj['target']})")
print(f"  Reward: {quest['reward_gold']}g, {quest['reward_xp']} XP")

var quest_schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "Quest",
        "schema": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "giver_npc": {"type": "string"},
                "description": {"type": "string"},
                "objectives": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "description": {"type": "string"},
                            "type": {"type": "string", "enum": ["kill", "collect", "talk", "explore", "escort"]},
                            "target": {"type": "string"},
                            "count": {"type": "integer", "minimum": 1}
                        },
                        "required": ["description", "type", "target", "count"]
                    }
                },
                "reward_gold": {"type": "integer", "minimum": 0},
                "reward_xp": {"type": "integer", "minimum": 0}
            },
            "required": ["title", "giver_npc", "description", "objectives", "reward_gold", "reward_xp"]
        },
        "strict": true
    }
}

func generate_quest(context: String) -> void:
    var request = {
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": [{"role": "user", "content": context}],
        "response_format": quest_schema
    }
    engine.async_chat_completions(JSON.stringify(request))
    # Parse the response JSON into your quest system data structures

[System.Serializable]
public class Quest
{
    public string title;
    public string giver_npc;
    public string description;
    public QuestObjective[] objectives;
    public int reward_gold;
    public int reward_xp;
}

[System.Serializable]
public class QuestObjective
{
    public string description;
    public string type;  // "kill", "collect", "talk", "explore", "escort"
    public string target;
    public int count;
}

// After receiving response:
var quest = JsonSerializer.Deserialize<Quest>(response.choices[0].message.content);
questLog.AddQuest(quest);

// Define USTRUCTs for deserialization
USTRUCT()
struct FQuestObjective
{
    GENERATED_BODY()
    UPROPERTY() FString Description;
    UPROPERTY() FString Type;   // "kill", "collect", "talk", "explore", "escort"
    UPROPERTY() FString Target;
    UPROPERTY() int32 Count;
};

USTRUCT()
struct FQuest
{
    GENERATED_BODY()
    UPROPERTY() FString Title;
    UPROPERTY() FString GiverNpc;
    UPROPERTY() FString Description;
    UPROPERTY() TArray<FQuestObjective> Objectives;
    UPROPERTY() int32 RewardGold;
    UPROPERTY() int32 RewardXp;
};

// Parse the content JSON into FQuest using FJsonObjectConverter
FQuest Quest;
FJsonObjectConverter::JsonObjectStringToUStruct(ContentJson, &Quest);
QuestManager->AddQuest(Quest);

Choice Constraint

Force the model to pick exactly one option from a list. Useful for classification, branching dialogue, and any pick-from-a-set scenario.

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
    "messages": [
      {"role": "system", "content": "You are an NPC reacting to the player entering a dark cave."},
      {"role": "user", "content": "How do you feel?"}
    ],
    "response_format": {
      "type": "choice",
      "choices": ["happy", "sad", "angry", "scared", "neutral"]
    }
  }'

The response content will be exactly one of the strings in choices -- no quotes, no extras. Parse it directly as a string.

This is ideal for driving game state: emotion systems, dialogue branching, action selection, difficulty ratings, etc.

Regex Constraint

Force the model to output text matching a regular expression. Useful for codes, dates, identifiers, and structured text that isn't JSON.

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
    "messages": [
      {"role": "user", "content": "Generate a fantasy item code."}
    ],
    "response_format": {
      "type": "regex",
      "pattern": "ITEM-[A-Z]{3}-[0-9]{3}"
    }
  }'

Example output: ITEM-LYS-042

The pattern uses standard regex syntax. The model's output is guaranteed to match the pattern exactly.

Grammar Constraint (Lark)

For complex output formats that go beyond regex but aren't JSON, you can specify a Lark context-free grammar.

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
    "messages": [
      {"role": "user", "content": "Greet the player."}
    ],
    "response_format": {
      "type": "grammar",
      "lark": "start: greeting \" \" name\ngreeting: \"hello\" | \"hi\" | \"hey\"\nname: /[A-Z][a-z]+/"
    }
  }'

Example output: hello Adventurer

This is the most flexible format -- you can define any structure that a context-free grammar can express. Use it for custom DSLs, formatted commands, or structured text that doesn't fit JSON.