Models
Listing Available Models
curl http://localhost:11434/v1/models
Returns all models the server can serve:
{
"object": "list",
"data": [
{"id": "in-memory::meta-llama/Llama-3.2-1B-Instruct", "object": "model", "owned_by": "atelico"},
{"id": "in-memory::meta-llama/Llama-3.2-3B-Instruct-Q4_K_M", "object": "model", "owned_by": "atelico"},
...
]
}
Model Naming
Models use a prefix::model-name syntax:
| Prefix | Backend | Description |
|---|---|---|
in-memory:: | Local GPU/CPU | On-device inference (default if no prefix) |
openai:: | OpenAI proxy | Forwards to OpenAI API |
image-generation:: | Local GPU/CPU | Image generation models |
mock:: | Mock | Returns hardcoded responses (for testing) |
If you omit the prefix, in-memory:: is assumed.
Supported Architectures
The engine supports these model architectures in both float (SafeTensors) and quantized (GGUF) formats:
| Architecture | Example Models |
|---|---|
| LLaMA / Mistral | Llama 3.x, Mistral 7B |
| Qwen 3 | Qwen3-0.6B, Qwen3-1.7B, Qwen3-4B, Qwen3-8B |
| Qwen 3.5 | Qwen3.5 (hybrid DeltaNet + attention) |
| Gemma 4 | Gemma 4 |
| Nemotron-H | Nemotron-H (hybrid Mamba2 + attention) |
| Parcae | Parcae (stable looped transformer) |
| SmolLM | SmolLM2-135M, SmolLM2-360M, SmolLM2-1.7B |
| SmolLM3 | SmolLM3-3B |
| Bonsai 1-bit | PrismML Bonsai 1-bit (Qwen3 architecture) |
Any HuggingFace model using one of these architectures can be loaded. The engine auto-detects the architecture from config.json or GGUF metadata.
Available LLM Models
Use ./atelico-asset-downloader list --namespace models to see all models available in your asset store. The models listed by GET /v1/models are pre-configured defaults:
| Model | Parameters | Format | VRAM |
|---|---|---|---|
meta-llama/Llama-3.2-1B-Instruct | 1B | float | ~2 GB |
meta-llama/Llama-3.2-1B-Instruct-Q4_K_M | 1B | GGUF | ~0.8 GB |
meta-llama/Llama-3.2-3B-Instruct | 3B | float | ~6 GB |
meta-llama/Llama-3.2-3B-Instruct-Q4_K_M | 3B | GGUF | ~2 GB |
meta-llama/Llama-3.1-8B-Instruct | 8B | float | ~16 GB |
meta-llama/Llama-3.1-8B-Instruct-Q4_K_M | 8B | GGUF | ~5 GB |
Any model downloaded to the cache can be used by passing its ID to the model field, even if it's not in the default list above.
Image Generation Models
| Model | Description |
|---|---|
pixart-alpha | PixArt-Alpha (DMD one-step) |
pixart-sigma | PixArt-Sigma (multi-step DPM-Solver) |
sana-sprint | Sana Sprint (SCM 2-step, fast) |
sana-0.6b | Sana 0.6B (flow matching, 20-step) |
sana-1.6b | Sana 1.6B (flow matching, 20-step, highest quality) |
Use with the image-generation:: prefix: "model": "image-generation::sana-sprint".
Text-to-Speech Models
| Model | Description |
|---|---|
hexgrad/Kokoro-82M | Kokoro 82M — 54 voices, 9 languages, streaming audio output |
TTS is accessed via the /v1/audio/speech endpoint (OpenAI-compatible). Specify a voice name in the voice field. Language is detected automatically from the text.
Speech-to-Text Models
| Model | Description |
|---|---|
openai/whisper-large-v3 | Whisper large-v3 — high-quality multilingual transcription |
openai/whisper-small | Whisper small — fast transcription, lower VRAM |
STT is accessed via the /v1/audio/transcriptions endpoint (OpenAI-compatible).
Choosing a Model
- Dialogue, simple conversation: 1B quantized is fast enough
- NPC personalities, creative writing: 3B quantized is a good balance
- Complex reasoning, structured generation: 8B quantized for best results
- Shipping a game: Quantized models are recommended -- smaller download, less VRAM, similar quality
Downloading Models
Use the asset downloader to fetch models:
# List what's available
./atelico-asset-downloader list --namespace models
# Download a specific model
./atelico-asset-downloader download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M
# Interactive mode -- browse and select
./atelico-asset-downloader interactive
Custom Asset Store
If your team hosts models on a private store:
./atelico-asset-downloader \
--store-url https://your-store.example.com \
--access-key YOUR_KEY \
--secret-key YOUR_SECRET \
download meta-llama/Llama-3.2-3B-Instruct-Q4_K_M
HuggingFace Fallback
If a model isn't in the asset store, the server can fetch it from HuggingFace as a last resort. Set the HF_TOKEN environment variable for gated models:
HF_TOKEN=hf_your_token_here ./atelico-server
Model Formats
| Format | Extension | Description |
|---|---|---|
| SafeTensors | .safetensors | Full-precision weights (F16/F32) |
| GGUF | .gguf | Quantized weights (Q4, Q8, etc.) |
| Bonsai 1-bit | .safetensors (special) | 1-bit quantized (experimental) |
Cache Location
Downloaded models are stored locally:
| Platform | Path |
|---|---|
| macOS | ~/Library/Caches/atelico/models/ |
| Linux | ~/.cache/atelico/models/ |
| Windows | %LOCALAPPDATA%\atelico\models\ |
Override with the ATELICO_CACHE_DIR environment variable.
Using a Proxy Backend
Forward requests to OpenAI or any OpenAI-compatible API:
OPENAI_API_KEY=sk-... ./atelico-server
Then use the openai:: prefix:
{
"model": "openai::gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}]
}
For other providers, use the generic proxy syntax:
PROXY_ANTHROPIC_API_KEY=sk-... \
PROXY_ANTHROPIC_BASE_URL=https://api.anthropic.com/v1 \
./atelico-server
Then use anthropic::model-name as the model identifier.