Skip to main content
Version: 0.8

Server Configuration

Command Line Options

./atelico-server [OPTIONS]
OptionDefaultDescription
-p, --port11434Port to listen on

Environment Variables

Model & Asset Management

VariableDescription
ATELICO_CACHE_DIRLocal model cache directory (default: ~/Library/Caches/atelico on macOS, ~/.cache/atelico on Linux)
ATELICO_ASSET_STORE_URLRemote asset store URL (S3, R2, HTTP, local path)
ATELICO_ASSET_ACCESS_KEYAccess key for private S3/R2 stores
ATELICO_ASSET_SECRET_KEYSecret key for private S3/R2 stores
HF_TOKENHuggingFace token for gated models (fallback source)

Proxy Backends

VariableDescription
OPENAI_API_KEYEnables the openai:: model prefix
OPENAI_BASE_URLOpenAI endpoint (default: https://api.openai.com/v1)
PROXY_<NAME>_API_KEYEnables <name>:: prefix for any OpenAI-compatible API
PROXY_<NAME>_BASE_URLEndpoint for the named proxy

Content Safety

VariableDescription
GUARDRAILS_ENABLEDEnable content filtering (true/false)
GUARDRAILS_POLICYstrict or permissive

Classifiers

VariableDescription
ATELICO_CLASSIFIERSComma-separated classifier IDs to load on startup
CLASSIFIER_DIRCustom classifier directory

Data Logging

VariableDescription
ATELICO_DATA_LOG_ENABLEDEnable request/response logging (true/false)
ATELICO_DATA_LOG_DIROutput directory for logs
ATELICO_DATA_LOG_SAMPLE_RATESampling rate (0.0 - 1.0)

Inference Tuning

VariableDescription
ATELICO_BONSAI_PREFILLspeed (default) or memory — Bonsai 1-bit models pre-dequantize weights to F16 for fast prefill at the cost of ~170 MB extra RAM (1.7B) or ~2.8 GB (8B). Set memory to skip pre-dequantization.
ATELICO_CUDA_GRAPHSSet to 0 to disable CUDA graph capture/replay (useful for debugging).
ATELICO_NO_CLOCK_LOCKSet to 1 to disable GPU clock locking on Windows (WDDM performance mitigation).

Audio

VariableDescription
ATELICO_KOKORO_QUANTq8_0 or q4_0 — opt-in quantization of Kokoro's linear layers (~12% / ~18% memory saving). Default: F16 on Metal/CUDA, F32 on CPU. Convolutional decoder is unaffected.
ATELICO_TTS_DTYPEPocket TTS runtime dtype. bf16 (default on GPU), f32 (default on CPU, parity reference), or f16 (~12% faster on Metal but minor quality drift).
ATELICO_POCKET_TTS_MAX_TOKENSMaximum SentencePiece tokens per Pocket TTS chunk. Default 50 (the model's training distribution boundary); raising this produces out-of-distribution audio.

Debug

VariableDescription
RUST_LOGLog level: error, warn, info, debug, trace

Example: Production Configuration

ATELICO_ASSET_STORE_URL=https://models.yourgame.com \
GUARDRAILS_ENABLED=true \
GUARDRAILS_POLICY=strict \
RUST_LOG=info \
./atelico-server --port 11434

Example: Development with OpenAI Fallback

OPENAI_API_KEY=sk-... \
RUST_LOG=debug \
./atelico-server

This lets you use local models (in-memory::...) and OpenAI models (openai::gpt-4o-mini) side by side during development.