Hybrid & Lexical Search
The engine's KV store (LanceDB-backed) supports three retrieval modes: vector similarity, lexical (full-text BM25), and hybrid (weighted blend of the two).
When to use which mode
| Mode | Strength | Weakness | Use for |
|---|---|---|---|
| Vector | Semantic similarity ("happiness" matches "joyful") | Misses exact matches; weak on rare proper nouns | Fuzzy semantic recall, RAG, NPC memory |
| Lexical | Exact term match, BM25-style relevance | No semantic understanding | Item names, place names, dialogue lookups |
| Hybrid | Both | More compute per query | Most game-like retrieval — players say "where is the Sword of Arrows" — names matter, but so does intent |
Setup: create an FTS index
A full-text-search index must be created once per (store, column) pair before lexical or hybrid queries can use it. Typical column choices: key_text (the search keys) or any string-typed value column.
- Python
- Godot (GDScript)
- Unity (C#)
- Unreal (C++)
- C FFI
- Rust SDK
import atelico
engine = atelico.Engine()
# ... create the store and insert entries ...
engine.kvstore_create_fts_index("memories", "key_text")
@onready var kv: AtelicoKvStoreNode = $AtelicoKvStore
func setup_fts() -> void:
kv.kvstore_create_fts_index("memories", "key_text")
engine.KvStore.CreateFtsIndex("memories", "key_text");
Atelico->KvStoreCreateFtsIndex(TEXT("memories"), TEXT("key_text"));
atelico_kvstore_create_fts_index(engine, "memories", "key_text");
engine.kvstore().create_fts_index_sync("memories", "key_text")?;
# Ok::<_, atelico_sdk::SdkError>(())
The index is persisted with the LanceDB table — restart the engine and it's still there.
Lexical (BM25) search
- Python
- Godot (GDScript)
- Unity (C#)
- Unreal (C++)
- C FFI
- Rust SDK
import json
query = json.dumps({
"text": "sword of arrows",
"column": "key_text",
"filter": None, # optional SQL WHERE clause
"limit": 10,
})
rows = json.loads(engine.kvstore_lexical_query("memories", query))
for r in rows:
# combined_score carries the BM25 score
print(r["key_text"], r["combined_score"])
var query := JSON.stringify({
"text": "sword of arrows",
"column": "key_text",
"filter": null,
"limit": 10,
})
var rows: Array = JSON.parse_string(kv.kvstore_lexical_query("memories", query))
for r in rows:
print("%s bm25=%.2f" % [r.key_text, r.combined_score])
string query = JsonSerializer.Serialize(new {
text = "sword of arrows",
column = "key_text",
filter = (string)null,
limit = 10,
});
string rowsJson = engine.KvStore.LexicalQuery("memories", query);
const FString Query = TEXT(R"({
"text": "sword of arrows",
"column": "key_text",
"filter": null,
"limit": 10
})");
FString RowsJson = Atelico->KvStoreLexicalQuery(TEXT("memories"), Query);
const char *q = "{\"text\":\"sword of arrows\",\"column\":\"key_text\",\"limit\":10}";
const char *rows = NULL;
atelico_kvstore_lexical_query(engine, "memories", q, &rows);
use atelico_sdk::LexicalSearchQuery;
let q = LexicalSearchQuery {
text: "sword of arrows".into(),
column: "key_text".into(),
filter: None,
limit: 10,
};
let rows = engine.kvstore().lexical_query_sync("memories", &q)?;
# Ok::<_, atelico_sdk::SdkError>(())
Hybrid (vector + lexical) search
The hybrid query takes both an embedding vector and a query text. Each branch returns up to per_branch_limit candidates (defaults to 2 * limit); scores are min-max normalised independently and combined as vector_weight * vec_norm + lexical_weight * lex_norm. Items missing from one branch contribute 0 to that branch's component.
- Python
- Godot (GDScript)
- Unity (C#)
- Unreal (C++)
- C FFI
- Rust SDK
import json
query = json.dumps({
"embeddings": query_vector, # list[float] of dim matching the store
"text": "sword of arrows",
"fts_column": "key_text",
"filter": "category = 'weapon'", # optional SQL WHERE
"limit": 5,
"per_branch_limit": 0, # 0 = auto (2 * limit)
"vector_weight": 0.6,
"lexical_weight": 0.4,
})
response = json.loads(engine.kvstore_hybrid_query("memories", query))
for row, trace in zip(response["results"], response["scores"]):
print(f"{row['key_text']} merged={trace['merged_score']:.2f} "
f"vec={trace['vector_score']} lex={trace['lexical_score']}")
var query := JSON.stringify({
"embeddings": query_vector,
"text": "sword of arrows",
"fts_column": "key_text",
"filter": "category = 'weapon'",
"limit": 5,
"per_branch_limit": 0,
"vector_weight": 0.6,
"lexical_weight": 0.4,
})
var response := JSON.parse_string(kv.kvstore_hybrid_query("memories", query))
for i in response.results.size():
var row = response.results[i]
var trace = response.scores[i]
print("%s merged=%.2f" % [row.key_text, trace.merged_score])
string query = JsonSerializer.Serialize(new {
embeddings = queryVector, // float[] of dim matching the store
text = "sword of arrows",
fts_column = "key_text",
filter = "category = 'weapon'",
limit = 5,
per_branch_limit = 0, // 0 = auto (2 * limit)
vector_weight = 0.6f,
lexical_weight = 0.4f,
});
string responseJson = engine.KvStore.HybridQuery("memories", query);
// Build the JSON request (embeddings array elided for brevity)
const FString Query = FString::Printf(TEXT(R"({
"embeddings": [%s],
"text": "sword of arrows",
"fts_column": "key_text",
"filter": "category = 'weapon'",
"limit": 5,
"per_branch_limit": 0,
"vector_weight": 0.6,
"lexical_weight": 0.4
})"), *EmbeddingsCsv);
FString ResponseJson = Atelico->KvStoreHybridQuery(TEXT("memories"), Query);
// Build the JSON query string however you prefer (sprintf, your JSON lib, etc.)
const char *response = NULL;
atelico_kvstore_hybrid_query(engine, "memories", query_json, &response);
use atelico_sdk::HybridSearchQuery;
let q = HybridSearchQuery {
embeddings: query_vector,
text: "sword of arrows".into(),
fts_column: "key_text".into(),
filter: Some("category = 'weapon'".into()),
limit: 5,
per_branch_limit: 0,
vector_weight: 0.6,
lexical_weight: 0.4,
};
let (results, scores) = engine.kvstore().hybrid_query_sync("memories", &q)?;
for (row, trace) in results.iter().zip(&scores) {
println!("{} merged={:.2}", row.key_text, trace.merged_score);
}
# Ok::<_, atelico_sdk::SdkError>(())
Hybrid scoring details
- Per-branch min-max normalisation. Vector distances are inverted to similarities first.
- Combined score:
vector_weight * vec_norm + lexical_weight * lex_norm. Items missing from one branch contribute 0 to that branch's component (so a row that only matched lexically still scores; it just doesn't get the vector boost). - Sort descending, truncate to
limit.
The scores array returned alongside results carries per-row component scores so callers can trace why a row ranked where it did — vector-driven, lexical-driven, or both.
Where to look in the code
atelico-search/src/query.rs—SearchMode,LexicalSearchQuery,HybridSearchQuery,HybridScoreTrace.atelico-search/src/store.rs—SearchStore::{create_fts_index, lexical_search, hybrid_search}and themerge_hybridreranker.atelico-search/src/kv_store.rs— KvStore wrappers used by the SDK and bindings (create_fts_index,lexical_query,hybrid_query).atelico-search/tests/test_hybrid_search.rs— end-to-end tests against a real tempdir LanceDB.