Skip to main content
Version: 0.7

Guide: NPC Dialogue

This guide shows how to build AI-driven NPC dialogue with personality, streaming text, multi-turn memory, and emotion tags for driving animations.

Giving an NPC a Personality

Use the system message to define who the NPC is. This sets the tone, vocabulary, and behavior for all responses:

import json
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")

response = client.chat.completions.create(
model="in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
messages=[
{
"role": "system",
"content": "You are Greta, a grumpy blacksmith in a medieval village. "
"You speak in short, blunt sentences. You secretly care about "
"the player but would never admit it. Keep responses under 3 sentences.",
},
{"role": "user", "content": "Can you forge me a legendary sword?"},
],
temperature=0.8,
)

print(response.choices[0].message.content)
# "Legendary? Ha. You can barely hold a dagger without cutting yourself."

Tips for system prompts:

  • Keep them concise -- the system prompt is sent with every request and uses context tokens
  • Include constraints like "Keep responses under 3 sentences" to prevent rambling
  • Define the NPC's knowledge boundaries: "You only know about the village and nearby forest"

Streaming for Typewriter Dialogue

Stream tokens one at a time to display text progressively, creating a natural typewriter effect in your dialogue UI:

stream = client.chat.completions.create(
model="in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
messages=[
{"role": "system", "content": "You are a mysterious oracle. Speak in riddles."},
{"role": "user", "content": "What lies beyond the mountains?"},
],
stream=True,
)

for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)

Multi-Turn Conversation

Include previous messages so the NPC remembers the conversation:

conversation = [
{"role": "system", "content": "You are Boris, a tavern keeper. Friendly and gossipy."}
]

def talk(player_message: str) -> str:
conversation.append({"role": "user", "content": player_message})
response = client.chat.completions.create(
model="in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
messages=conversation,
)
reply = response.choices[0].message.content
conversation.append({"role": "assistant", "content": reply})
return reply

print(talk("What's on the menu?"))
# "Roasted boar and mushroom stew! The stew's my specialty."
print(talk("I'll have the stew. Any rumors?"))
# Boris now knows you ordered the stew

Dialogue with Emotion Tags

Use structured generation to get both the NPC's text and metadata for driving animations. The engine automatically describes the JSON schema to the model, so you don't need to include "respond as JSON" instructions in your prompts:

response = client.chat.completions.create(
model="in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
messages=[
{"role": "system", "content": "You are Greta, a grumpy blacksmith."},
{"role": "user", "content": "I brought you flowers!"},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "NPCDialogue",
"schema": {
"type": "object",
"properties": {
"text": {"type": "string"},
"emotion": {"type": "string", "enum": ["happy", "sad", "angry", "surprised", "neutral", "embarrassed"]},
"gesture": {"type": "string", "enum": ["wave", "nod", "shrug", "point", "cross_arms", "none"]},
},
"required": ["text", "emotion", "gesture"],
},
"strict": True,
},
},
)

import json
dialogue = json.loads(response.choices[0].message.content)
print(f"[{dialogue['emotion']}] {dialogue['text']}")
# [embarrassed] Flowers?! I... well, put them over there, I suppose.
# Use dialogue['gesture'] to trigger animation: "cross_arms"

The output is guaranteed to match the schema -- emotion will always be one of the six defined values, and gesture will always be valid. Your game code can safely use these values without validation.

Tips for Better NPC Dialogue

  • Temperature 0.6-0.8 gives natural variation without being too random
  • Keep system prompts concise -- every token counts toward context length
  • Limit response length with max_tokens (50-150 for dialogue) to prevent rambling
  • Trim conversation history -- keep the system prompt and last 10-15 messages, summarize older ones to stay within context limits
  • Use enums in structured generation for fields your game needs to branch on (emotions, actions, items)