Prompts & Generation Policy
atelico-core::prompts is the cognitive layer of the engine. It wraps a chat completion in declarative prompt assembly, output parsing, validation, retry, repair, and fallback — so a game can express what it wants from the model without re-implementing the loop on every call site.
The shape:
LmFunction (template + schema + resolvers + parser)
└─ run under GenerationPolicy (validators + repair + fallbacks + guardrails + cache)
This page is a tour of the moving parts. For the simple "render a Minijinja template and call the model once" case, see Chat Completions; the resolver/policy stack is opt-in on top of that.
Why this exists
A bare chat completion forces every game to re-solve the same problems:
- The prompt is a template that needs variables, defaults, and validation.
- The model sometimes returns slightly-wrong JSON.
- A free-text generation might leak; a structured output might miss a required field.
- A weak local model fails; a stronger fallback is fine.
- Identical inputs called repeatedly should hit a cache; sampling-noise inputs should not.
LmFunction and GenerationPolicy express all of that as composable config, so the call site stays a one-liner and the engine handles the loop.
A complete example
This is the actual "select an affordance" flow inside atelico-characters (see atelico-characters/src/cognition/action.rs). It picks one item from a list, validates the structure, retries with the error appended on failure, and falls back to the first available affordance if every attempt fails.
use atelico_core::lm_function::LmFunction;
use atelico_core::prompts::generation::{FallbackConfig, GenerationPolicy, RepairPolicy};
use atelico_core::prompts::{
BuiltInParser, OutputParserConfig, PromptResolveContext, ValidatorConfig,
};
let schema = serde_json::json!({
"type": "object",
"required": ["activity_id", "reasoning"],
"properties": {
"activity_id": { "type": "string" },
"reasoning": { "type": "string" }
}
});
let lm_fn = LmFunction::new("next_action", NEXT_ACTION_TEMPLATE)
.with_schema(schema.clone())
.with_temperature(0.5)
.with_max_tokens(256)
.with_parser(OutputParserConfig::BuiltIn(BuiltInParser::TolerantField(
"activity_id".into(),
)));
let policy = GenerationPolicy::new()
.with_max_attempts(2)
.with_validator(ValidatorConfig::JsonSchemaSubset(schema))
.with_repair(RepairPolicy::RetryWithErrorMessage)
.with_fallback(FallbackConfig::Static {
content: "{\"activity_id\":\"idle\",\"reasoning\":\"fallback\"}".into(),
parsed: Some(serde_json::json!({ "activity_id": "idle" })),
domain: None,
});
let mut ctx = PromptResolveContext::default();
let (result, gen_trace) = lm_fn
.execute_with_policy(&router, model, &variables, None, &mut ctx, &policy)
.await?;
The trace returned alongside the result records every attempt: which prompt was sent, what the raw output was, which validator failed, whether local repair was applied, which fallback (if any) produced the final value. Useful for developer trust and after-the-fact debugging.
Layer 1: Prompt resolvers
A resolver is a declarative step that produces a named variable for the Minijinja template render. You attach resolvers to an LmFunction; they run in order before the template is rendered. Templates use plain {{ variable }} references — no custom syntax to learn.
The shipped resolvers:
| Resolver | Produces | Use case |
|---|---|---|
Variable | Caller-supplied vars + defaults + processors | Required-input validation, normalisation |
RandomTable | One row from a .txt or .jsonl file | Random NPC quirks, weather, tavern names |
File | Full file contents (text, JSONL, or numbered-indexed) | Inline lore, available affordances |
Schema | A registered JSON schema as text | "Output must match this schema" injected into the prompt |
Example | A named example file's contents | Few-shot examples that vary per call |
Choose | Top-1 selection via atelico_matcher::Matcher | LM-driven or embedding-driven choice |
Retrieval | Top-k entries from a registered RetrievalSource | RAG, recent memory, conversational context |
Multiple references to the same RandomTable with the same share_key reuse the first draw — a pattern from deck-editor-style prompt programming.
use atelico_core::prompts::{ResolverConfig, resolvers::{file::FileMode, random_table::TableFormat}};
use std::path::PathBuf;
let lm_fn = LmFunction::new("greet_player", GREET_TEMPLATE)
.with_resolver(ResolverConfig::RandomTable {
name: "quirk".into(),
path: PathBuf::from("npc_quirks.txt"),
format: TableFormat::Auto,
weight_field: None,
share_key: Some("npc_quirks".into()),
})
.with_resolver(ResolverConfig::File {
name: "available_actions".into(),
path: PathBuf::from("affordances.jsonl"),
mode: FileMode::Indexed,
});
The PromptResolveContext carries the seeded RNG, file roots, schema registry, registered matchers, and registered retrieval sources. All of that lives outside the LmFunction so the function definition is reusable across calls; the context is the per-call state.
Trace and reproducibility
Every resolver records: which file it read, that file's content hash, the random seed used, source path. The LmFunctionResult.cache_key_material field rolls all of it up into a stable JSON blob. Two calls with the same inputs produce the same key material — this is what the prompt result cache (below) keys on.
Layer 2: Output parsers
A parser maps the raw model string into a domain value. The built-ins:
| Parser | Behaviour |
|---|---|
Raw | Pass through the raw text |
Json | Strict serde_json::from_str. Errors on malformed JSON |
IntegerField(field) | Extract an integer from field. Accepts integer or string-int |
StringField(field) | Extract a string from field |
TolerantField(field) | Recursively walk the JSON looking for field — works even when the model nests it |
ChoiceIndex { max } | Parse a 0-based choice index. Accepts "3", {"choice": 3}, or "3." |
TolerantJson | Conservative repair: strip junk, drop trailing commas, close unclosed braces |
For domain-specific parsing (e.g. mapping a model-returned speaker label back to the actual participant id) use OutputParserConfig::Custom with an Arc<dyn Fn(&str, Option<&serde_json::Value>) -> Result<ParsedOutput, ParseError> + Send + Sync>.
TolerantField is the workhorse for game integration — even when the model wraps the answer in {"choice": {"activity_id": "..."}} instead of returning a flat object, the parser pulls the right field out.
Layer 3: Generation policy
GenerationPolicy wraps an LmFunction execution in a structured loop:
pub struct GenerationPolicy {
pub max_attempts: usize,
pub validators: Vec<ValidatorConfig>,
pub repair: RepairPolicy,
pub fallbacks: Vec<FallbackConfig>,
pub guardrails: GuardrailMode,
pub guardrails_handle: Option<GuardrailsHandle>,
pub timeout_ms: Option<u64>,
pub cache_handle: Option<Arc<InMemoryCache>>,
}
The default policy is a single-attempt no-validators no-fallbacks call — equivalent to LmFunction::execute_with_context. Layers add behaviour:
Validators
Run after every attempt. First failure determines the next step.
JsonParse— output must parse as JSON.JsonSchemaSubset(schema)— structural subset check (type,required,properties,items,enum). For non-llguidance backends; full schema enforcement remains the local llguidance path's job.Regex(pattern)— output (after trimming) must match.Choice(options)— output (after trimming) must equal one of the strings.Custom { name, check }— your own callback.
Repair
-
None— fail the attempt and either retry as-is or run fallbacks. -
LocalJsonRepair— port of GARP's conservative repair (strip junk, drop trailing commas, close unclosed braces). No new LM call — re-validates the cleaned form in process. -
RetryWithErrorMessage— re-issue the call with the validation error appended:Your previous output was invalid: ... Return only valid output matching the requested format.
-
RetryWithOriginalPromptAndError— same but uses the original prompt verbatim instead of the modified one (useful when prior attempts mutated the prompt heavily).
Fallbacks
Tried in order if max_attempts is exhausted with no validator-passing output.
Prompt(template)— re-issue with this prompt instead.Model(other_model)— re-issue with a stronger backup model.Generation { max_tokens, temperature }— re-issue with sampling overrides.Static { content, parsed, domain }— return a fixed value, no LM call.CacheOnly— return cache hit if any, otherwise an empty result. No LM call.
Static is what keeps a game running rather than throwing when every attempt fails — typically you provide a sensible default ("idle action") and let the simulation continue.
Guardrail integration
Function-level guardrails are decoupled via the GuardrailsCheck trait so atelico-core does not take a hard dependency on atelico-guardrails. Wrap any guardrails impl in GuardrailsHandle and attach to the policy:
let policy = GenerationPolicy::new()
.with_guardrails(GuardrailMode::Strict)
.with_guardrails_handle(GuardrailsHandle(Arc::new(MyGuardrails)));
Modes:
Off— not invoked.Monitor— verdict recorded in trace, never blocks.Strict— block triggers fallbacks immediately.RetryOnBlock— block participates in the retry loop with the verdict reason as the error.
Both input and output verdicts land in AttemptTrace.guardrail_verdicts. Router-level guardrails (existing) remain untouched and run independently.
Layer 4: Prompt result cache
The result cache lives in atelico_core::cache::InMemoryCache and caches full LmFunctionResult JSON keyed on canonicalised (model, prompt_text, schema, sampling_params).
use atelico_core::cache::{CacheConfig, CachePolicy, InMemoryCache};
let cache = Arc::new(InMemoryCache::new(CacheConfig::default()));
let lm_fn = LmFunction::new("greet", GREET_TEMPLATE)
.with_cache_policy(CachePolicy::Exact { ttl: None });
let policy = GenerationPolicy::new().with_cache(cache);
CachePolicy per function:
Disabled(default) — no lookup, no store.Exact { ttl }— hit-or-miss with optional time-to-live.OnlyCache— only return cached results. Miss short-circuits to fallbacks (no LM call).Refresh— bypass lookup but store the result.
The cache is independent of and complementary to the engine's KV-prefix cache. They live at different levels:
| Layer | Caches | Key | Hit means |
|---|---|---|---|
InMemoryCache (this) | Final result | Full prompt + sampling params | Skip the LM entirely |
| KV prefix cache (existing) | KV pages | Token-prefix-of-the-current-prompt | Skip prefill, still decode |
Both run together. Sampling params change the result-cache key but not the prefix-cache key — so a temperature change misses the result cache but still hits the prefix cache.
Putting the layers together
The order of operations on execute_with_policy:
- Resolvers run, populating
accumulator. - Required-input check — invalid-input fallback fires here if a required variable is missing.
- Schema rewriter (if any) mutates the schema based on resolved variables.
- Strict-schema preprocessing (
StrictExternalorAsIs). - Template renders.
- Result cache lookup. Hit → return.
- Optional input guardrail check.
Strictblock → fallbacks. - Attempt loop up to
max_attempts:- Build request, call router.
- Optional output guardrail check.
- Validators run.
- On pass: store in result cache (if configured), return.
- On fail: apply repair (LocalJsonRepair, retry-with-error, etc.).
- Attempts exhausted → fallbacks tried in order.
- No fallback succeeded → propagate the last validation error.
GenerationTrace records every step of this loop alongside the existing PromptResolutionTrace.
When to opt in
| You want... | Use |
|---|---|
| One model call, simple template, minimal config | LmFunction::execute(...) (the simple path) |
| Resolvers, parser, invalid-input fallback | LmFunction::execute_with_context(...) |
| Validators, retry, repair, fallbacks, guardrails, cache | LmFunction::execute_with_policy(..., &policy) |
The simple path is the default for new code. Migrate a call site to the policy path when you find yourself wanting any one of: a retry on bad JSON, a stronger fallback model, a static-default-when-all-fails behaviour, or function-level guardrails.
Calling from your SDK or game engine
The full LmFunction + resolver + parser stack is configured in Rust (it's a deck-style declarative API where prompts, resolvers, schemas, and parsers are composed at startup and reused across calls). Once a function is defined, the generation policy layer — validators, repair, retries, fallbacks — is exposed via JSON to all SDKs and game-engine bindings.
Each binding takes the same four JSON blobs (function, variables, optional system prompt, policy) and returns the result plus a trace.
Policy JSON schema
{
"max_attempts": 3,
"validators": [
{"type": "json_parse"},
{"type": "json_schema_subset", "schema": {"type": "object"}},
{"type": "regex", "pattern": "^[A-Z].*"},
{"type": "choice", "options": ["yes", "no"]}
],
"repair": "local_json_repair",
"fallbacks": [
{"type": "model", "model": "in-memory::stronger-model"},
{"type": "static", "content": "{\"label\": \"unknown\"}"}
],
"guardrails": "off",
"timeout_ms": 30000
}
All fields are optional. repair accepts "none", "local_json_repair", "retry_with_error_message", "retry_with_original_prompt_and_error". guardrails accepts "off", "monitor", "strict", "retry_on_block". Fallback type accepts "none", "prompt" (with "template"), "model" (with "model"), "generation" (with "max_tokens" and/or "temperature"), "static" (with "content"), "cache_only".
Function-level guardrails require a Rust-side GuardrailsHandle and are a no-op when invoked through the SDK / FFI / game-engine bindings — only router-level guardrails apply for those callers. Set "guardrails": "off" (default) when calling from outside Rust.
Usage
- Python
- Godot (GDScript)
- Unity (C#)
- Unreal (C++)
- C FFI
- Rust SDK
import json
import atelico
engine = atelico.Engine()
function = json.dumps({
"name": "classify_intent",
"template": "Classify the player's intent: {{input}}",
"schema": {"type": "object", "properties": {"intent": {"type": "string"}}},
"max_tokens": 64,
"temperature": 0.2,
})
variables = json.dumps({"input": "I want to trade my sword for gold"})
policy = json.dumps({
"max_attempts": 3,
"validators": [{"type": "json_parse"}],
"repair": "local_json_repair",
"fallbacks": [
{"type": "static", "content": '{"intent": "unknown"}'}
],
})
response = json.loads(engine.llm_call_function_with_policy(
function_json=function,
model_id="in-memory::meta-llama/Llama-3.2-1B-Instruct",
variables_json=variables,
system_prompt=None,
policy_json=policy,
))
print(response["content"])
print("attempts:", len(response["trace"]["attempts"]))
print("fell back:", response["trace"]["final_from_fallback"])
@onready var engine: AtelicoEngineNode = $AtelicoEngine
func classify_intent(player_input: String) -> Dictionary:
var function := JSON.stringify({
"name": "classify_intent",
"template": "Classify the player's intent: {{input}}",
"schema": {"type": "object", "properties": {"intent": {"type": "string"}}},
"max_tokens": 64,
"temperature": 0.2,
})
var variables := JSON.stringify({"input": player_input})
var policy := JSON.stringify({
"max_attempts": 3,
"validators": [{"type": "json_parse"}],
"repair": "local_json_repair",
"fallbacks": [
{"type": "static", "content": "{\"intent\": \"unknown\"}"}
],
})
var response_json := engine.llm_call_function_with_policy(
function, "in-memory::meta-llama/Llama-3.2-1B-Instruct",
variables, "", policy)
return JSON.parse_string(response_json)
var engine = new AtelicoEngine();
string function = JsonSerializer.Serialize(new {
name = "classify_intent",
template = "Classify the player's intent: {{input}}",
schema = new { type = "object", properties = new { intent = new { type = "string" } } },
max_tokens = 64,
temperature = 0.2,
});
string variables = JsonSerializer.Serialize(new { input = "I want to trade my sword for gold" });
string policy = JsonSerializer.Serialize(new {
max_attempts = 3,
validators = new[] { new { type = "json_parse" } },
repair = "local_json_repair",
fallbacks = new[] {
new { type = "static", content = "{\"intent\": \"unknown\"}" }
},
});
string responseJson = engine.Llm.CallFunctionWithPolicy(
function,
"in-memory::meta-llama/Llama-3.2-1B-Instruct",
variables,
policy,
systemPrompt: null);
auto* Atelico = GEngine->GetEngineSubsystem<UAtelicoAISubsystem>();
const FString Function = TEXT(R"({
"name": "classify_intent",
"template": "Classify the player's intent: {{input}}",
"schema": {"type": "object", "properties": {"intent": {"type": "string"}}},
"max_tokens": 64,
"temperature": 0.2
})");
const FString Variables = TEXT(R"({"input": "I want to trade my sword for gold"})");
const FString Policy = TEXT(R"({
"max_attempts": 3,
"validators": [{"type": "json_parse"}],
"repair": "local_json_repair",
"fallbacks": [
{"type": "static", "content": "{\"intent\": \"unknown\"}"}
]
})");
FString ResponseJson = Atelico->CallFunctionWithPolicy(
Function,
TEXT("in-memory::meta-llama/Llama-3.2-1B-Instruct"),
Variables,
Policy,
/* SystemPrompt */ TEXT(""));
const char *function = "{\"name\":\"classify_intent\","
"\"template\":\"Classify the player's intent: {{input}}\","
"\"schema\":{\"type\":\"object\","
"\"properties\":{\"intent\":{\"type\":\"string\"}}},"
"\"max_tokens\":64,\"temperature\":0.2}";
const char *variables = "{\"input\":\"I want to trade my sword for gold\"}";
const char *policy =
"{\"max_attempts\":3,"
" \"validators\":[{\"type\":\"json_parse\"}],"
" \"repair\":\"local_json_repair\","
" \"fallbacks\":[{\"type\":\"static\","
"\"content\":\"{\\\"intent\\\": \\\"unknown\\\"}\"}]}";
const char *response_json = NULL;
atelico_llm_call_function_with_policy(
engine, function,
"in-memory::meta-llama/Llama-3.2-1B-Instruct",
variables, /* system_prompt */ NULL, policy, &response_json);
use atelico_sdk::{LmFunction, GenerationPolicy, ValidatorConfig, RepairPolicy, FallbackConfig};
let function = LmFunction::new("classify_intent", "Classify the player's intent: {{input}}")
.with_schema(serde_json::json!({"type":"object","properties":{"intent":{"type":"string"}}}))
.with_max_tokens(64)
.with_temperature(0.2);
let policy = GenerationPolicy::new()
.with_max_attempts(3)
.with_validator(ValidatorConfig::JsonParse)
.with_repair(RepairPolicy::LocalJsonRepair)
.with_fallback(FallbackConfig::Static {
content: r#"{"intent": "unknown"}"#.to_string(),
parsed: None,
domain: None,
});
let variables = serde_json::json!({"input": "I want to trade my sword for gold"});
let (result, trace) = engine.llm().call_function_with_policy_sync(
&function,
"in-memory::meta-llama/Llama-3.2-1B-Instruct",
&variables,
None,
&policy,
)?;
println!("{} attempts={}", result.content, trace.attempts.len());
# Ok::<_, atelico_sdk::SdkError>(())
The response is the same JSON across all bindings:
{
"content": "{\"intent\": \"trade\"}",
"parsed": {"intent": "trade"},
"trace": {
"attempts": [
{"attempt": 1, "model": "in-memory::...", "outcome": "Pass", "validators": [...]}
],
"fallback_used": null,
"fallback_kind": null,
"final_from_fallback": false,
"cache_status": null
}
}
Where to look in the code
atelico-core/src/prompts/lm_function.rs— theLmFunctionand result types.atelico-core/src/prompts/resolvers/— one file per resolver.atelico-core/src/prompts/parser.rs— parsers.atelico-core/src/prompts/generation/—policy.rs,validator.rs,repair.rs,executor.rs,trace.rs.atelico-core/src/cache.rs— the result cache.atelico-characters/src/cognition/action.rs— a real in-tree caller using the full stack.atelico-core/tests/prompts_resolvers.rs,prompts_generation_policy.rs— end-to-end tests with a sequenced mock backend.