Skip to main content
Version: 0.7

Models

Listing Available Models

curl http://localhost:11434/v1/models

Returns all models the server can serve:

{
"object": "list",
"data": [
{"id": "in-memory::meta-llama/Llama-3.2-1B-Instruct", "object": "model", "owned_by": "atelico"},
{"id": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M", "object": "model", "owned_by": "atelico"},
...
]
}

Model Naming

Models use a prefix::model-name syntax:

PrefixBackendDescription
in-memory::Local GPU/CPUOn-device inference (default if no prefix)
openai::OpenAI proxyForwards to OpenAI API
image-generation::Local GPU/CPUImage generation models
mock::MockReturns hardcoded responses (for testing)

If you omit the prefix, in-memory:: is assumed.

Supported Architectures

The engine supports these model architectures in both float (SafeTensors) and quantized (GGUF) formats:

ArchitectureExample Models
LLaMA / MistralLlama 3.x, Mistral 7B
Qwen 3Qwen3-0.6B, Qwen3-1.7B, Qwen3-4B, Qwen3-8B
Qwen 3.5Qwen3.5 (hybrid DeltaNet + attention)
Gemma 4Gemma 4
Nemotron-HNemotron-H (hybrid Mamba2 + attention)
ParcaeParcae (stable looped transformer)
SmolLMSmolLM2-135M, SmolLM2-360M, SmolLM2-1.7B
SmolLM3SmolLM3-3B
Bonsai 1-bitPrismML Bonsai 1-bit (Qwen3 architecture)

Any HuggingFace model using one of these architectures can be loaded. The engine auto-detects the architecture from config.json or GGUF metadata.

Available LLM Models

Use ./atelico-asset-downloader list --namespace models to see all models available in your asset store. The models listed by GET /v1/models are pre-configured defaults:

ModelParametersFormatVRAM
meta-llama/Llama-3.2-1B-Instruct1Bfloat~2 GB
meta-llama/Llama-3.2-1B-Instruct-Q4_K_M1BGGUF~0.8 GB
meta-llama/Llama-3.2-3B-Instruct3Bfloat~6 GB
meta-llama/Llama-3.2-3B-Instruct-Q4_K_M3BGGUF~2 GB
meta-llama/Llama-3.1-8B-Instruct8Bfloat~16 GB
meta-llama/Llama-3.1-8B-Instruct-Q4_K_M8BGGUF~5 GB

Any model downloaded to the cache can be used by passing its ID to the model field, even if it's not in the default list above.

Image Generation Models

ModelDescription
pixart-alphaPixArt-Alpha (DMD one-step)
pixart-sigmaPixArt-Sigma (multi-step DPM-Solver)
sana-sprintSana Sprint (SCM 2-step, fast)
sana-0.6bSana 0.6B (flow matching, 20-step)
sana-1.6bSana 1.6B (flow matching, 20-step, highest quality)

Use with the image-generation:: prefix: "model": "image-generation::sana-sprint".

Text-to-Speech Models

ModelDescription
hexgrad/Kokoro-82MKokoro 82M — 54 voices, 9 languages, streaming audio output

TTS is accessed via the /v1/audio/speech endpoint (OpenAI-compatible). Specify a voice name in the voice field. Language is detected automatically from the text.

Speech-to-Text Models

ModelDescription
openai/whisper-large-v3Whisper large-v3 — high-quality multilingual transcription
openai/whisper-smallWhisper small — fast transcription, lower VRAM

STT is accessed via the /v1/audio/transcriptions endpoint (OpenAI-compatible).

Choosing a Model

  • Dialogue, simple conversation: 1B quantized is fast enough
  • NPC personalities, creative writing: 3B quantized is a good balance
  • Complex reasoning, structured generation: 8B quantized for best results
  • Shipping a game: Quantized models are recommended -- smaller download, less VRAM, similar quality

Downloading Models

Use the asset downloader to fetch models:

# List what's available
./atelico-asset-downloader list --namespace models

# Download a specific model
./atelico-asset-downloader download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M

# Interactive mode -- browse and select
./atelico-asset-downloader interactive

Custom Asset Store

If your team hosts models on a private store:

./atelico-asset-downloader \
--store-url https://your-store.example.com \
--access-key YOUR_KEY \
--secret-key YOUR_SECRET \
download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M

HuggingFace Fallback

If a model isn't in the asset store, the server can fetch it from HuggingFace as a last resort. Set the HF_TOKEN environment variable for gated models:

HF_TOKEN=hf_your_token_here ./atelico-server

Model Formats

FormatExtensionDescription
SafeTensors.safetensorsFull-precision weights (F16/F32)
GGUF.ggufQuantized weights (Q4, Q8, etc.)
Bonsai 1-bit.safetensors (special)1-bit quantized (experimental)

Cache Location

Downloaded models are stored locally:

PlatformPath
macOS~/Library/Caches/atelico/models/
Linux~/.cache/atelico/models/
Windows%LOCALAPPDATA%\atelico\models\

Override with the ATELICO_CACHE_DIR environment variable.

Using a Proxy Backend

Forward requests to OpenAI or any OpenAI-compatible API:

OPENAI_API_KEY=sk-... ./atelico-server

Then use the openai:: prefix:

{
"model": "openai::gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}]
}

For other providers, use the generic proxy syntax:

PROXY_ANTHROPIC_API_KEY=sk-... \
PROXY_ANTHROPIC_BASE_URL=https://api.anthropic.com/v1 \
./atelico-server

Then use anthropic::model-name as the model identifier.