Skip to main content
Version: 0.9

Classifiers

Embedding-based classifiers for text and images. Three families are shipped:

  • Centroid — mean embedding per class, nearest-cosine prediction. Zero hyperparameters, strong baseline.
  • KNN (HNSW) — approximate nearest-neighbour vote. Best when classes have local structure rather than a single global centroid.
  • SetFit — few-shot contrastive fine-tuning of a sentence transformer with an optional LP-FT recipe.

All three share one API surface: a tagged embedder config picks the modality, and the classifier itself is modality-agnostic. Add image (or future audio) data by swapping the embedder; everything else — training, evaluation, persistence, the inference endpoint — stays the same.

Picking an embedder

Classifiers are configured with a ClassifierEmbedderConfig, a tagged enum over the per-modality embedders shipped by atelico-embed:

VariantEmbedderNotes
Text(EmbedderConfig)Sentence transformers — AllMiniLML6V2, BGEBaseENV15, BGESmallENV15, etc.Default for text
Vision(VisionEmbedderConfig)DINOv2-Small or DINOv2-LargeNew in 0.9 — image classifiers

Mixing modalities in one classifier is not supported — train one classifier per modality.

Dataset format

A single JSONL row carries exactly one of text, image_path, or audio_path, plus a label:

{"text": "US stocks rally on tech earnings", "label": "Business"}
{"image_path": "data/cats/cat_001.jpg", "label": "cat"}
{"image_path": "data/dogs/dog_017.jpg", "label": "dog"}

Rows that mix input fields, or omit all of them, are rejected at load time with an explicit error.

Training (Rust)

Training happens in Rust via the atelico-classifiers crate; the resulting model is then loaded into the engine for serving from any binding. Example (image-modality centroid):

use atelico_classifiers::{
centroid::{CentroidClassifier, CentroidConfig},
data::InMemoryDataset,
embedder_config::ClassifierEmbedderConfig,
};
use atelico_embed::{
vision_embedder::{VisionEmbedderConfig, VisionEmbeddingModel},
EmbedInputOwned,
};
use std::path::PathBuf;

let inputs: Vec<EmbedInputOwned> = vec![
EmbedInputOwned::image(PathBuf::from("data/cats/01.jpg")),
EmbedInputOwned::image(PathBuf::from("data/dogs/01.jpg")),
];
let labels = vec!["cat".into(), "dog".into()];
let dataset = InMemoryDataset::new(inputs, labels);

let cfg = CentroidConfig {
embedder: ClassifierEmbedderConfig::Vision(VisionEmbedderConfig {
model: VisionEmbeddingModel::DINOv2Small,
batch_size: 4,
}),
};
let mut clf = CentroidClassifier::new(cfg, /* assets */ None)?;
clf.train(&dataset, /* batch_size */ 4)?;
clf.save(std::path::Path::new("./models/animals"))?;
# Ok::<_, anyhow::Error>(())

Swap CentroidConfig for KnnConfig or SetFitConfig to use the other classifier types — the call sites stay identical.

Serving — text input

Once a classifier is saved to disk, load it under an ID and call predict.

import json
import atelico

engine = atelico.Engine()
# (Implementation note: classifier loading is currently performed at engine
# startup via the ATELICO_CLASSIFIERS environment variable; programmatic load
# from Python follows the same pattern as other subsystems.)

result = json.loads(engine.classifier_predict("sentiment", "I love this!", top_k=3))
print(result["label"], result["probability"])

Serving — image input (DINOv2)

The classifier referenced by model_id must have been trained with a Vision embedder. Calling the image endpoint against a text-only classifier returns an error.

import json
import atelico

engine = atelico.Engine()
result = json.loads(engine.classifier_predict_image(
"animals",
"/abs/path/to/cat.jpg",
top_k=3,
))
print(result["label"], result["probability"])

DINOv2 vision embeddings

VisionEmbeddingModel ships two sizes:

VariantHF repoEmbedding dimUse when
DINOv2Smallfacebook/dinov2-small384Default. Fast, ~22M params. Good for general object / scene categories.
DINOv2Largefacebook/dinov2-large1024Higher quality at ~300M params. Worth the cost when classes are visually subtle.

DINOv2 produces strong general-purpose visual features without per-task pre-training, which makes the Centroid classifier a surprisingly capable baseline for image tasks — try it first before reaching for SetFit fine-tuning.

The same VisionEmbedder is also exposed as a standalone embedder via atelico-embed if you only need raw image vectors (e.g. for similarity search, clustering, or feeding the Hybrid Search store).

Persistence and serving

Trained classifiers persist to disk (safetensors for SetFit, JSON for centroid / KNN) and are loaded into the engine via the ATELICO_CLASSIFIERS environment variable on startup, or programmatically through the SDK and bindings shown above.

The same classifier infrastructure also powers the Guardrails ML-classifier layer for content moderation.