Structured Generation
Structured generation constrains the model's output to match a format you define. The output is guaranteed to be valid -- no retries, no hoping the model formats things correctly.
Supported formats:
- JSON Schema -- structured objects with typed fields, enums, nested objects, and arrays
- Choice -- pick exactly one from a list of options (e.g., emotions, actions, item types)
- Regex -- match a pattern (e.g., dates, codes, identifiers)
- Grammar -- match a Lark context-free grammar (e.g., custom DSLs, complex formats)
This is essential for game development where AI output needs to be consumed by code.
How It Works
Add a response_format field to your chat completion request with a JSON Schema:
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "user", "content": "Generate a random fantasy weapon"}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "Weapon",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"type": {"type": "string", "enum": ["sword", "axe", "bow", "staff", "dagger"]},
"damage": {"type": "integer", "minimum": 1, "maximum": 100},
"rarity": {"type": "string", "enum": ["common", "uncommon", "rare", "legendary"]},
"description": {"type": "string"}
},
"required": ["name", "type", "damage", "rarity", "description"]
},
"strict": true
}
}
}'
The content in the response is always valid JSON matching your schema. Parse it directly in your game code.
Schema Format
The response_format object has this structure:
{
"type": "json_schema",
"json_schema": {
"name": "SchemaName",
"schema": { ... },
"strict": true,
"schema_injection": "concise"
}
}
| Field | Type | Default | Description |
|---|---|---|---|
type | string | Must be "json_schema" | |
json_schema.name | string | A name for the schema (used internally) | |
json_schema.schema | object | Standard JSON Schema object | |
json_schema.strict | boolean | false | Enforce strict adherence (recommended: true) |
json_schema.schema_injection | string | "concise" | Controls how the schema is described in the prompt (see below) |
Schema Injection
When you use structured generation, the engine automatically describes the JSON schema in the system message so the model knows what to produce. This is critical for smaller models -- without it, the model may generate degenerate output (whitespace loops) because it doesn't know it should output JSON.
The schema_injection field controls the verbosity of this description:
| Level | Description | Token Cost |
|---|---|---|
"none" | No injection. Use when your prompt already contains JSON instructions. | 0 |
"light" | Field names and types only. Compact one-liner. | Low |
"concise" | Field names, types, and descriptions from the schema. Default. | Medium |
"full" | Injects the complete JSON schema verbatim. | High |
You don't need to include "respond as JSON" in your prompts -- the engine handles this automatically. If you're migrating from a setup where you manually included JSON instructions in your prompts, you can either remove them (recommended) or set schema_injection to "none" to avoid redundancy.
OpenAI clients that don't send schema_injection get the default "concise" behavior automatically. The field is fully backward-compatible.
Example: NPC Dialogue with Emotion Tags
Generate NPC dialogue with emotion metadata for driving animations:
- Python
- Godot (GDScript)
- Unity (C#)
- Unreal (C++)
import json
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")
response = client.chat.completions.create(
model="in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
messages=[
{"role": "system", "content": "You are Greta, a grumpy but kind-hearted blacksmith."},
{"role": "user", "content": "I brought you flowers!"},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "NPCDialogue",
"schema": {
"type": "object",
"properties": {
"text": {"type": "string"},
"emotion": {"type": "string", "enum": ["happy", "sad", "angry", "surprised", "neutral", "embarrassed"]},
"gesture": {"type": "string", "enum": ["wave", "nod", "shrug", "point", "cross_arms", "none"]},
},
"required": ["text", "emotion", "gesture"],
},
"strict": True,
},
},
)
dialogue = json.loads(response.choices[0].message.content)
print(f"[{dialogue['emotion']}] {dialogue['text']}")
# e.g. [embarrassed] Flowers?! I... well, put them over there, I suppose.
var dialogue_schema = {
"type": "json_schema",
"json_schema": {
"name": "NPCDialogue",
"schema": {
"type": "object",
"properties": {
"text": {"type": "string"},
"emotion": {"type": "string", "enum": ["happy", "sad", "angry", "surprised", "neutral", "embarrassed"]},
"gesture": {"type": "string", "enum": ["wave", "nod", "shrug", "point", "cross_arms", "none"]}
},
"required": ["text", "emotion", "gesture"]
},
"strict": true
}
}
func talk_to_greta(player_input: String) -> void:
var request = {
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are Greta, a grumpy but kind-hearted blacksmith."},
{"role": "user", "content": player_input}
],
"response_format": dialogue_schema
}
engine.async_chat_completions(JSON.stringify(request))
func _on_async_request_completed(_job_id: int, response: String) -> void:
var parsed = JSON.parse_string(response)
var dialogue = JSON.parse_string(parsed["choices"][0]["message"]["content"])
# Use emotion to set animation state
npc_sprite.play(dialogue["emotion"])
# Use gesture to trigger animation
if dialogue["gesture"] != "none":
npc_sprite.play_gesture(dialogue["gesture"])
dialogue_label.text = dialogue["text"]
[System.Serializable]
public class NPCDialogue
{
public string text;
public string emotion; // "happy", "sad", "angry", "surprised", "neutral", "embarrassed"
public string gesture; // "wave", "nod", "shrug", "point", "cross_arms", "none"
}
public async Task<NPCDialogue> TalkToGreta(string playerInput)
{
var request = new
{
model = "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
messages = new object[]
{
new { role = "system", content = "You are Greta, a grumpy but kind-hearted blacksmith." },
new { role = "user", content = playerInput }
},
response_format = new
{
type = "json_schema",
json_schema = new
{
name = "NPCDialogue",
schema = new
{
type = "object",
properties = new
{
text = new { type = "string" },
emotion = new { type = "string", @enum = new[] { "happy", "sad", "angry", "surprised", "neutral", "embarrassed" } },
gesture = new { type = "string", @enum = new[] { "wave", "nod", "shrug", "point", "cross_arms", "none" } }
},
required = new[] { "text", "emotion", "gesture" }
},
strict = true
}
}
};
var json = JsonSerializer.Serialize(request);
var content = new StringContent(json, Encoding.UTF8, "application/json");
var response = await client.PostAsync($"{BaseUrl}/chat/completions", content);
var responseJson = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(responseJson);
var dialogueJson = doc.RootElement.GetProperty("choices")[0]
.GetProperty("message").GetProperty("content").GetString();
return JsonSerializer.Deserialize<NPCDialogue>(dialogueJson);
// dialogue.emotion -> set animator state
// dialogue.gesture -> trigger animation clip
// dialogue.text -> display in dialogue UI
}
// Define the schema as a JSON object
TSharedPtr<FJsonObject> BuildDialogueSchema()
{
// Build the response_format object with json_schema
// containing text, emotion (enum), and gesture (enum) fields
// ... (same pattern as other JSON building examples) ...
}
void UAtelicoClient::TalkToGreta(const FString& PlayerInput)
{
TSharedPtr<FJsonObject> Body = MakeShareable(new FJsonObject);
Body->SetStringField("model", "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M");
// ... add messages ...
Body->SetObjectField("response_format", BuildDialogueSchema());
SendRequestWithCallback(Body, [this](TSharedPtr<FJsonObject> Response)
{
FString ContentJson = Response->GetArrayField("choices")[0]
->AsObject()->GetObjectField("message")->GetStringField("content");
TSharedPtr<FJsonObject> Dialogue;
auto Reader = TJsonReaderFactory<>::Create(ContentJson);
FJsonSerializer::Deserialize(Reader, Dialogue);
FString Text = Dialogue->GetStringField("text");
FString Emotion = Dialogue->GetStringField("emotion");
FString Gesture = Dialogue->GetStringField("gesture");
// Drive animation: SetAnimState(Emotion), PlayGesture(Gesture)
// Display text in dialogue widget
});
}
Your game reads emotion to set the character's facial expression and gesture to trigger an animation.
Example: Quest Generation
- Python
- Godot (GDScript)
- Unity (C#)
- Unreal (C++)
response = client.chat.completions.create(
model="in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
messages=[
{"role": "user", "content": "Create a side quest for a coastal fishing village troubled by sea creatures"}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "Quest",
"schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"giver_npc": {"type": "string"},
"description": {"type": "string"},
"objectives": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"type": {"type": "string", "enum": ["kill", "collect", "talk", "explore", "escort"]},
"target": {"type": "string"},
"count": {"type": "integer", "minimum": 1},
},
"required": ["description", "type", "target", "count"],
},
},
"reward_gold": {"type": "integer", "minimum": 0},
"reward_xp": {"type": "integer", "minimum": 0},
},
"required": ["title", "giver_npc", "description", "objectives", "reward_gold", "reward_xp"],
},
"strict": True,
},
},
)
quest = json.loads(response.choices[0].message.content)
print(f"Quest: {quest['title']} (from {quest['giver_npc']})")
for obj in quest["objectives"]:
print(f" [{obj['type']}] {obj['description']} ({obj['count']}x {obj['target']})")
print(f" Reward: {quest['reward_gold']}g, {quest['reward_xp']} XP")
var quest_schema = {
"type": "json_schema",
"json_schema": {
"name": "Quest",
"schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"giver_npc": {"type": "string"},
"description": {"type": "string"},
"objectives": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"type": {"type": "string", "enum": ["kill", "collect", "talk", "explore", "escort"]},
"target": {"type": "string"},
"count": {"type": "integer", "minimum": 1}
},
"required": ["description", "type", "target", "count"]
}
},
"reward_gold": {"type": "integer", "minimum": 0},
"reward_xp": {"type": "integer", "minimum": 0}
},
"required": ["title", "giver_npc", "description", "objectives", "reward_gold", "reward_xp"]
},
"strict": true
}
}
func generate_quest(context: String) -> void:
var request = {
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [{"role": "user", "content": context}],
"response_format": quest_schema
}
engine.async_chat_completions(JSON.stringify(request))
# Parse the response JSON into your quest system data structures
[System.Serializable]
public class Quest
{
public string title;
public string giver_npc;
public string description;
public QuestObjective[] objectives;
public int reward_gold;
public int reward_xp;
}
[System.Serializable]
public class QuestObjective
{
public string description;
public string type; // "kill", "collect", "talk", "explore", "escort"
public string target;
public int count;
}
// After receiving response:
var quest = JsonSerializer.Deserialize<Quest>(response.choices[0].message.content);
questLog.AddQuest(quest);
// Define USTRUCTs for deserialization
USTRUCT()
struct FQuestObjective
{
GENERATED_BODY()
UPROPERTY() FString Description;
UPROPERTY() FString Type; // "kill", "collect", "talk", "explore", "escort"
UPROPERTY() FString Target;
UPROPERTY() int32 Count;
};
USTRUCT()
struct FQuest
{
GENERATED_BODY()
UPROPERTY() FString Title;
UPROPERTY() FString GiverNpc;
UPROPERTY() FString Description;
UPROPERTY() TArray<FQuestObjective> Objectives;
UPROPERTY() int32 RewardGold;
UPROPERTY() int32 RewardXp;
};
// Parse the content JSON into FQuest using FJsonObjectConverter
FQuest Quest;
FJsonObjectConverter::JsonObjectStringToUStruct(ContentJson, &Quest);
QuestManager->AddQuest(Quest);
Choice Constraint
Force the model to pick exactly one option from a list. Useful for classification, branching dialogue, and any pick-from-a-set scenario.
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "system", "content": "You are an NPC reacting to the player entering a dark cave."},
{"role": "user", "content": "How do you feel?"}
],
"response_format": {
"type": "choice",
"choices": ["happy", "sad", "angry", "scared", "neutral"]
}
}'
The response content will be exactly one of the strings in choices -- no quotes, no extras. Parse it directly as a string.
This is ideal for driving game state: emotion systems, dialogue branching, action selection, difficulty ratings, etc.
Regex Constraint
Force the model to output text matching a regular expression. Useful for codes, dates, identifiers, and structured text that isn't JSON.
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "user", "content": "Generate a fantasy item code."}
],
"response_format": {
"type": "regex",
"pattern": "ITEM-[A-Z]{3}-[0-9]{3}"
}
}'
Example output: ITEM-LYS-042
The pattern uses standard regex syntax. The model's output is guaranteed to match the pattern exactly.
Grammar Constraint (Lark)
For complex output formats that go beyond regex but aren't JSON, you can specify a Lark context-free grammar.
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
"messages": [
{"role": "user", "content": "Greet the player."}
],
"response_format": {
"type": "grammar",
"lark": "start: greeting \" \" name\ngreeting: \"hello\" | \"hi\" | \"hey\"\nname: /[A-Z][a-z]+/"
}
}'
Example output: hello Adventurer
This is the most flexible format -- you can define any structure that a context-free grammar can express. Use it for custom DSLs, formatted commands, or structured text that doesn't fit JSON.
Structured Generation with Streaming
Structured generation works with streaming too. The JSON is built token by token, and the final assembled output is guaranteed valid. Accumulate the streamed tokens, then parse the complete JSON when the stream finishes.
Tips
- Keep schemas focused. Smaller schemas with fewer fields produce better results. Don't ask for 20 fields when you need 5.
- Use enums for fields that should be one of a known set of values (emotions, item types, difficulty levels). This prevents the model from inventing invalid values.
- System prompts still matter. The schema constrains the structure, but the system prompt influences the content quality. "Generate a balanced RPG encounter for level 8 players" produces better results than just "Generate an encounter."
- Temperature matters less with structured generation since the schema already constrains the output, but lower temperatures (0.3-0.7) produce more consistent field values.
- Add
descriptionfields to your schema properties. These are included in the auto-injected prompt hint (at"concise"level), helping the model understand what each field should contain. - Use
"schema_injection": "light"on very small models (1B) with long prompts to save context tokens. Use"full"on larger models (7B+) with complex nested schemas for maximum precision.