Version: 0.9

Chat Completions API

The chat completions endpoint generates responses from a conversation. It's OpenAI-compatible, so if you've used the OpenAI API before, this works the same way.

POST /v1/chat/completions

Basic Request

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Request Parameters

Parameter	Type	Default	Description
`model`	string	required	Model identifier (e.g., `in-memory::meta-llama/Llama-3.2-3B-Instruct`)
`messages`	array	required	Conversation messages (see below)
`stream`	boolean	`false`	Enable token-by-token streaming
`temperature`	float	`1.0`	Sampling temperature. Lower = more deterministic, higher = more creative
`max_tokens`	integer	model default	Maximum tokens to generate
`response_format`	object	`null`	Constrain output format (see Structured Generation)
`enable_thinking`	boolean	`null`	Enable extended thinking for models that support it (e.g., Qwen 3.5)

Message Roles

Each message in the messages array has a role and content:

Role	Purpose
`system`	Sets the AI's behavior, personality, or constraints. Placed first.
`user`	The human's message.
`assistant`	The AI's previous response. Used for multi-turn context.

System Prompts

System prompts define how the model behaves. They're essential for game NPCs, assistants, or any specialized behavior:

Python
Godot (GDScript)
Unity (C#)
Unreal (C++)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
    messages=[
        {
            "role": "system",
            "content": "You are a ship AI aboard a deep-space freighter. You speak formally, "
                       "address the player as Captain, and provide status reports when asked. "
                       "You are concerned about a recent anomaly in sector 7.",
        },
        {"role": "user", "content": "Status report."},
    ],
    temperature=0.7,
)

print(response.choices[0].message.content)

func get_ship_ai_response(player_input: String) -> void:
    var request = {
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": [
            {
                "role": "system",
                "content": "You are a ship AI aboard a deep-space freighter. You speak formally, address the player as Captain, and provide status reports when asked. You are concerned about a recent anomaly in sector 7."
            },
            {"role": "user", "content": player_input}
        ],
        "temperature": 0.7
    }
    engine.async_chat_completions(JSON.stringify(request))

public async Task<string> GetShipAIResponse(string playerInput)
{
    var request = new
    {
        model = "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        messages = new object[]
        {
            new { role = "system", content = "You are a ship AI aboard a deep-space freighter. You speak formally, address the player as Captain, and provide status reports when asked." },
            new { role = "user", content = playerInput }
        },
        temperature = 0.7
    };

    var json = JsonSerializer.Serialize(request);
    var content = new StringContent(json, Encoding.UTF8, "application/json");
    var response = await client.PostAsync($"{BaseUrl}/chat/completions", content);
    var responseJson = await response.Content.ReadAsStringAsync();

    using var doc = JsonDocument.Parse(responseJson);
    return doc.RootElement.GetProperty("choices")[0]
        .GetProperty("message").GetProperty("content").GetString();
}

void UAtelicoClient::GetShipAIResponse(const FString& PlayerInput)
{
    TSharedPtr<FJsonObject> Body = MakeShareable(new FJsonObject);
    Body->SetStringField("model", "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M");
    Body->SetNumberField("temperature", 0.7);

    TArray<TSharedPtr<FJsonValue>> Messages;

    TSharedPtr<FJsonObject> SystemMsg = MakeShareable(new FJsonObject);
    SystemMsg->SetStringField("role", "system");
    SystemMsg->SetStringField("content",
        "You are a ship AI aboard a deep-space freighter. You speak formally, "
        "address the player as Captain, and provide status reports when asked.");
    Messages.Add(MakeShareable(new FJsonValueObject(SystemMsg)));

    TSharedPtr<FJsonObject> UserMsg = MakeShareable(new FJsonObject);
    UserMsg->SetStringField("role", "user");
    UserMsg->SetStringField("content", PlayerInput);
    Messages.Add(MakeShareable(new FJsonValueObject(UserMsg)));

    Body->SetArrayField("messages", Messages);
    SendRequest(Body); // see Getting Started for full HTTP setup
}

Multi-Turn Conversations

Include previous messages to give the model context of the conversation:

Python
Godot (GDScript)
Unity (C#)
Unreal (C++)

response = client.chat.completions.create(
    model="in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
    messages=[
        {"role": "system", "content": "You are a helpful tavern keeper named Boris."},
        {"role": "user", "content": "What do you have on the menu?"},
        {"role": "assistant", "content": "We have roasted boar, mushroom stew, and fresh bread. The stew is my specialty!"},
        {"role": "user", "content": "I'll have the stew. Any rumors lately?"},
    ],
)

print(response.choices[0].message.content)

# Keep conversation history in an array
var conversation: Array = [
    {"role": "system", "content": "You are a helpful tavern keeper named Boris."}
]

func talk_to_npc(player_input: String) -> void:
    conversation.append({"role": "user", "content": player_input})
    var request = {
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": conversation
    }
    engine.async_chat_completions(JSON.stringify(request))

func _on_async_request_completed(_job_id: int, response: String) -> void:
    var parsed = JSON.parse_string(response)
    var reply = parsed["choices"][0]["message"]["content"]
    conversation.append({"role": "assistant", "content": reply})
    dialogue_label.text = reply

private List<object> conversation = new()
{
    new { role = "system", content = "You are a helpful tavern keeper named Boris." }
};

public async Task<string> TalkToNPC(string playerInput)
{
    conversation.Add(new { role = "user", content = playerInput });

    var request = new { model = "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M", messages = conversation };
    var json = JsonSerializer.Serialize(request);
    var content = new StringContent(json, Encoding.UTF8, "application/json");
    var response = await client.PostAsync($"{BaseUrl}/chat/completions", content);
    var responseJson = await response.Content.ReadAsStringAsync();

    using var doc = JsonDocument.Parse(responseJson);
    var reply = doc.RootElement.GetProperty("choices")[0]
        .GetProperty("message").GetProperty("content").GetString();

    conversation.Add(new { role = "assistant", content = reply });
    return reply;
}

// Store conversation as TArray<TSharedPtr<FJsonValue>>
TArray<TSharedPtr<FJsonValue>> Conversation;

void UAtelicoClient::InitConversation()
{
    TSharedPtr<FJsonObject> SystemMsg = MakeShareable(new FJsonObject);
    SystemMsg->SetStringField("role", "system");
    SystemMsg->SetStringField("content", "You are a helpful tavern keeper named Boris.");
    Conversation.Add(MakeShareable(new FJsonValueObject(SystemMsg)));
}

void UAtelicoClient::TalkToNPC(const FString& PlayerInput)
{
    TSharedPtr<FJsonObject> UserMsg = MakeShareable(new FJsonObject);
    UserMsg->SetStringField("role", "user");
    UserMsg->SetStringField("content", PlayerInput);
    Conversation.Add(MakeShareable(new FJsonValueObject(UserMsg)));

    TSharedPtr<FJsonObject> Body = MakeShareable(new FJsonObject);
    Body->SetStringField("model", "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M");
    Body->SetArrayField("messages", Conversation);

    // On response callback, parse assistant reply and append to Conversation
    SendRequest(Body);
}

Streaming

Set "stream": true to receive tokens as they're generated. This is ideal for typewriter-style dialogue UI.

SSE Format: Each token arrives as a Server-Sent Event. The stream ends with data: [DONE].

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}
...
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]

Python
Godot (GDScript)
Unity (C#)
Unreal (C++)

stream = client.chat.completions.create(
    model="in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
    messages=[
        {"role": "system", "content": "You are a narrator for a fantasy RPG."},
        {"role": "user", "content": "Describe the entrance to the dungeon."},
    ],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
print()

# Streaming uses the built-in signal-based API
func stream_narration(prompt: String) -> void:
    var request = {
        "model": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        "messages": [
            {"role": "system", "content": "You are a narrator for a fantasy RPG."},
            {"role": "user", "content": prompt}
        ]
    }
    engine.stream_chat_completions(JSON.stringify(request))

# Called once per token as it arrives
func _on_inference_token_generated(_job_id: int, token: String) -> void:
    dialogue_label.text += token  # typewriter effect

func _on_inference_completed(_job_id: int) -> void:
    print("Stream finished")

public async Task StreamToDialogue(string prompt, TMPro.TextMeshProUGUI dialogueText)
{
    var request = new
    {
        model = "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M",
        messages = new[]
        {
            new { role = "system", content = "You are a narrator for a fantasy RPG." },
            new { role = "user", content = prompt }
        },
        stream = true
    };

    var json = JsonSerializer.Serialize(request);
    var httpContent = new StringContent(json, Encoding.UTF8, "application/json");
    var httpRequest = new HttpRequestMessage(HttpMethod.Post, $"{BaseUrl}/chat/completions")
    {
        Content = httpContent
    };

    var response = await client.SendAsync(httpRequest, HttpCompletionOption.ResponseHeadersRead);
    using var stream = await response.Content.ReadAsStreamAsync();
    using var reader = new StreamReader(stream);

    dialogueText.text = "";
    while (await reader.ReadLineAsync() is { } line)
    {
        if (line.StartsWith("data: ") && line != "data: [DONE]")
        {
            var chunk = JsonDocument.Parse(line.Substring(6));
            var delta = chunk.RootElement.GetProperty("choices")[0].GetProperty("delta");
            if (delta.TryGetProperty("content", out var c))
                dialogueText.text += c.GetString();
        }
    }
}

void UAtelicoClient::StreamNarration(const FString& Prompt)
{
    auto Request = FHttpModule::Get().CreateRequest();
    Request->SetURL(TEXT("http://localhost:11434/v1/chat/completions"));
    Request->SetVerb(TEXT("POST"));
    Request->SetHeader(TEXT("Content-Type"), TEXT("application/json"));

    // Build JSON with "stream": true and messages array
    // ... (see Getting Started for JSON building pattern) ...

    // Handle chunked SSE responses via progress callback
    Request->OnRequestProgress().BindLambda(
        [this](FHttpRequestPtr Req, int32 BytesSent, int32 BytesReceived)
    {
        FString Content = Req->GetResponse()->GetContentAsString();
        // Parse new SSE lines since last callback
        // Extract delta.content tokens
        // Append to dialogue UTextBlock
    });

    Request->ProcessRequest();
}

Temperature

Temperature controls randomness. For game applications:

Use Case	Temperature	Why
Factual responses, game rules	0.1 - 0.3	Consistent, predictable
NPC dialogue, general conversation	0.6 - 0.8	Natural variation
Creative writing, storytelling	0.9 - 1.2	More surprising, diverse

Finish Reasons

The finish_reason field tells you why generation stopped:

Value	Meaning
`stop`	Model finished naturally (end of response)
`length`	Hit the `max_tokens` limit

Error Handling

Errors return standard HTTP status codes with an OpenAI-compatible error body:

{
  "error": {
    "message": "Model 'nonexistent-model' not found",
    "type": "invalid_request_error",
    "param": "model",
    "code": null
  }
}

Status	Meaning
400	Invalid request (bad model name, malformed JSON)
500	Server error (model failed to load, inference error)

Basic Request​

Request Parameters​

Message Roles​

System Prompts​

Multi-Turn Conversations​

Streaming​

Temperature​

Finish Reasons​

Error Handling​