Agent Embeddings — Client Developer Documentation
Introduction
The Agents Embedding module provides a unified interface to generate embeddings from multiple backends (OpenAI, Hugging Face, and remote AIOS blocks via gRPC). It is declared in the Subject Specification via a Model Item and invoked at runtime through the EmbeddingGeneratorManager.
It provides:
- Pluggable backends (OpenAI / Hugging Face / AIOS gRPC / Custom)
- Single & batched embedding generation
- Per-backend parameters (model, dims, normalize, etc.)
- Consistent API across local and remote execution
A typical workflow is:
- Declare a ModelItem with
llm_type="embeddings"
and backend-specific parameters. - Build a generator (OpenAI/HF/AIOS) from the model item.
- Register the generator in
EmbeddingGeneratorManager
. - Generate embeddings for single inputs or batches.
Registering Embeddings in Subject Specification
Embeddings are registered using a ModelItem
entry.
Data Class Reference
from dataclasses import dataclass, field
from typing import Dict, Any
@dataclass
class ModelItem:
llm_type: str
llm_block_id: str
llm_selection_query: Dict[str, Any] = field(default_factory=dict)
llm_parameters: Dict[str, Any] = field(default_factory=dict)
For embeddings: set
llm_type = "embeddings"
.llm_block_id
identifies the model/provider target (e.g.,"openai:text-embedding-3-small"
,"hf:sentence-transformers/all-MiniLM-L6-v2"
,"aios:embeddings-prod"
).llm_selection_query
can be used by your orchestrator to select a specific deployment/tenant/region.llm_parameters
carries provider-specific settings (e.g., model name, dimension, normalize, batch size).
Minimal Required Configuration
{
"llm_type": "embeddings",
"llm_block_id": "openai:text-embedding-3-small",
"llm_selection_query": {},
"llm_parameters": {
"model": "text-embedding-3-small",
"normalize": true
}
}
Field Descriptions
Field | Location | Type | Description |
---|---|---|---|
llm_type | root | str |
Must be "embeddings" for the embeddings generator. |
llm_block_id | root | str |
Logical route to a backend target (e.g., openai:* , hf:* , aios:* ). |
llm_selection_query | root | dict |
Optional selector (region, tier, tenant) used by your runtime to pick a concrete deployment. |
llm_parameters | root | dict |
Backend-specific configuration: model , dims , normalize , batch_size , api_key_ref , etc. |
Example in a Subject Specification
{
"subject_id": "agent-retriever",
"subject_type": "agent",
"models": [
{
"llm_type": "embeddings",
"llm_block_id": "openai:text-embedding-3-small",
"llm_selection_query": { "region": "ap-south-1" },
"llm_parameters": {
"model": "text-embedding-3-small",
"normalize": true
}
},
{
"llm_type": "embeddings",
"llm_block_id": "hf:sentence-transformers/all-MiniLM-L6-v2",
"llm_parameters": {
"device": "cuda",
"batch_size": 64,
"normalize": true
}
},
{
"llm_type": "embeddings",
"llm_block_id": "aios:embeddings-prod",
"llm_selection_query": { "tenant": "org-123", "tier": "prod" },
"llm_parameters": {
"grpc_url": "dns:///embeddings.aio.svc.cluster.local:9000",
"timeout_s": 20,
"dims": 768
}
}
]
}
Import and Setup
from agent_sdk.embeddings.generator import (
AbstractEmbeddingGenerator,
OpenAIEmbeddingGenerator,
HuggingFaceEmbeddingGenerator,
AIOSEmbeddings,
EmbeddingGeneratorManager
)
Building a Generator from a ModelItem
def build_embeddings_from_model_item(item: dict) -> AbstractEmbeddingGenerator:
block_id = item["llm_block_id"]
params = item.get("llm_parameters", {})
if block_id.startswith("openai:"):
return OpenAIEmbeddingGenerator(
api_key=params.get("api_key") or params.get("api_key_ref"),
model=params.get("model", "text-embedding-3-small"),
normalize=params.get("normalize", True)
)
if block_id.startswith("hf:"):
return HuggingFaceEmbeddingGenerator(
model_name=block_id.split("hf:", 1)[1],
device=params.get("device", "cpu"),
batch_size=params.get("batch_size", 32),
normalize=params.get("normalize", True)
)
if block_id.startswith("aios:"):
return AIOSEmbeddings(
grpc_url=params["grpc_url"],
dims=params.get("dims", 768),
timeout_s=params.get("timeout_s", 15),
normalize=params.get("normalize", True)
)
raise NotImplementedError(f"Unknown embeddings backend for {block_id}")
EmbeddingGeneratorManager
manager = EmbeddingGeneratorManager()
openai_item = {
"llm_type": "embeddings",
"llm_block_id": "openai:text-embedding-3-small",
"llm_parameters": {"model": "text-embedding-3-small", "normalize": True}
}
hf_item = {
"llm_type": "embeddings",
"llm_block_id": "hf:sentence-transformers/all-MiniLM-L6-v2",
"llm_parameters": {"device": "cuda", "batch_size": 64, "normalize": True}
}
openai_gen = build_embeddings_from_model_item(openai_item)
hf_gen = build_embeddings_from_model_item(hf_item)
manager.register_generator("openai", openai_gen)
manager.register_generator("hf-minilm", hf_gen)
# Single
vec = manager.generate("openai", "What is AI?")
# Batch
vecs = manager.generate_batch("hf-minilm", ["Hello", "World"])
Usage Guide
OpenAI Example
openai_gen = OpenAIEmbeddingGenerator(
api_key="YOUR_OPENAI_KEY",
model="text-embedding-3-small",
normalize=True
)
v = openai_gen.generate("vectorize this")
V = openai_gen.generate_batch(["one", "two", "three"])
Hugging Face Example
hf_gen = HuggingFaceEmbeddingGenerator(
model_name="sentence-transformers/all-MiniLM-L6-v2",
device="cuda", batch_size=64, normalize=True
)
v = hf_gen.generate("vectorize this locally")
V = hf_gen.generate_batch(["one", "two"])
AIOS gRPC Example
aios_gen = AIOSEmbeddings(
grpc_url="dns:///embeddings.aio.svc.cluster.local:9000",
dims=768, timeout_s=15, normalize=True
)
v = aios_gen.generate("remote embedding through AIOS gRPC")
Writing a Custom Embeddings Generator
Implement the base interface:
from agent_sdk.embeddings.generator import AbstractEmbeddingGenerator
class MyCustomEmbeddings(AbstractEmbeddingGenerator):
def __init__(self, dims=384, normalize=False):
super().__init__()
self.dims = dims
self.normalize = normalize
def generate(self, text: str):
vec = self._infer(text) # your logic
return self._post(vec)
def generate_batch(self, texts: list):
return [self._post(self._infer(t)) for t in texts]
def _infer(self, text):
# return list[float] of length self.dims
...
def _post(self, vec):
if self.normalize:
import math
norm = math.sqrt(sum(x*x for x in vec)) or 1.0
vec = [x / norm for x in vec]
return vec
Register with the manager:
manager.register_generator("custom", MyCustomEmbeddings(dims=256, normalize=True))
Best-Practice Patterns
- Normalize vectors (
normalize=True
) if your downstream vector DB expects cosine similarity. - Batch where possible for throughput (
generate_batch
). - Pin device for HF (
device="cuda"
or"cpu"
) and pick a sensiblebatch_size
. - Separate concerns: use
ModelItem
only for selection/config; keep runtime secrets (API keys) in env/secret refs. - Observability: record latency, batch size, dims, and provider in metrics for capacity planning.
Quick Subject Spec Example (two models)
{
"subject_id": "agent-rag",
"subject_type": "agent",
"models": [
{
"llm_type": "embeddings",
"llm_block_id": "openai:text-embedding-3-small",
"llm_parameters": { "model": "text-embedding-3-small", "normalize": true }
},
{
"llm_type": "embeddings",
"llm_block_id": "hf:sentence-transformers/all-MiniLM-L6-v2",
"llm_parameters": { "device": "cuda", "batch_size": 64, "normalize": true }
}
]
}