Agent LLM Interface — Client Developer Documentation
Introduction
The Agent LLM Interface module provides a unified framework for managing, selecting, and executing Large Language Models (LLMs) and optimizer blocks from multiple backends (OpenAI, AIOS gRPC, OrgLLM, custom). It is declared in the Subject Specification via a Model Item and executed at runtime through the KnownLLMs registry and the LLMInferenceAPI interface.
It provides:
- Pluggable backends (OpenAI / AIOS gRPC / OrgLLM / Custom)
- Single & batched prompt execution
- Per-backend parameters (model, temperature, max tokens, etc.)
- Consistent API for local and remote execution
- Planner integration for dynamic model selection
A typical workflow is:
- Declare a ModelItem with
llm_type="inference"
orllm_type="optimizer"
and backend-specific parameters. - Build an inference backend (OpenAI/AIOS/OrgLLM/Custom) from the model item.
- Register the backend in
KnownLLMs
. - Run inference or estimation calls via a consistent API.
Registering LLMs in Subject Specification
LLMs are registered using a ModelItem
entry.
Data Class Reference
from dataclasses import dataclass, field
from typing import Dict, Any
@dataclass
class ModelItem:
llm_type: str
llm_block_id: str
llm_selection_query: Dict[str, Any] = field(default_factory=dict)
llm_parameters: Dict[str, Any] = field(default_factory=dict)
For LLM inference: set
llm_type = "inference"
(or"optimizer"
for optimizers).llm_block_id
identifies the model/provider target (e.g.,"openai:gpt-4"
,"aios:llm-prod"
,"orgllm:planner-qa"
).llm_selection_query
can be used by your orchestrator to select a specific deployment/tenant/region.llm_parameters
carries provider-specific settings (e.g., model, temperature, max tokens, streaming).
Minimal Required Configuration
{
"llm_type": "inference",
"llm_block_id": "openai:gpt-4",
"llm_parameters": {
"model": "gpt-4",
"temperature": 0.7
}
}
Field Descriptions
Field | Location | Type | Description |
---|---|---|---|
llm_type | root | str |
"inference" for LLMs, "optimizer" for optimizer blocks. |
llm_block_id | root | str |
Logical route to a backend target (e.g., openai:* , aios:* , orgllm:* ). |
llm_selection_query | root | dict |
Optional selector (region, tier, tenant) used by your runtime to pick a concrete deployment. |
llm_parameters | root | dict |
Backend-specific configuration: model , temperature , max_tokens , streaming , etc. |
Example in a Subject Specification
{
"subject_id": "agent-planner",
"subject_type": "agent",
"models": [
{
"llm_type": "inference",
"llm_block_id": "openai:gpt-4",
"llm_selection_query": { "region": "us-east-1" },
"llm_parameters": {
"model": "gpt-4",
"temperature": 0.7
}
},
{
"llm_type": "inference",
"llm_block_id": "aios:llm-prod",
"llm_parameters": {
"grpc_url": "dns:///llm.aio.svc.cluster.local:9000",
"timeout_s": 20
}
},
{
"llm_type": "optimizer",
"llm_block_id": "orgllm:planner-qa",
"llm_selection_query": { "tenant": "org-123" },
"llm_parameters": {
"grpc_url": "dns:///orgllm.qa.svc.cluster.local:9000",
"temperature": 0
}
}
]
}
Import and Setup
from agents_sdk.llm import (
KnownLLMs,
KnownLLMOptimizers,
BlocksQueryManager,
LLMInferenceAPI,
OpenAIInferenceAPI,
AIOSInferenceAPI,
OrgLLMInferenceAPI
)
Building an LLM Backend from a ModelItem
def build_llm_from_model_item(item: dict) -> LLMInferenceAPI:
block_id = item["llm_block_id"]
params = item.get("llm_parameters", {})
if block_id.startswith("openai:"):
return OpenAIInferenceAPI(
api_key=params.get("api_key") or params.get("api_key_ref"),
model=params.get("model", "gpt-4"),
temperature=params.get("temperature", 0.7)
)
if block_id.startswith("aios:"):
return AIOSInferenceAPI(
grpc_url=params["grpc_url"],
timeout_s=params.get("timeout_s", 15)
)
if block_id.startswith("orgllm:"):
return OrgLLMInferenceAPI(
grpc_url=params["grpc_url"],
temperature=params.get("temperature", 0.0)
)
raise NotImplementedError(f"Unknown LLM backend for {block_id}")
KnownLLMs Usage
llm_manager = KnownLLMs(
base_url="http://registry.local",
adhoc_inference_url="http://inference.local"
)
openai_item = {
"llm_type": "inference",
"llm_block_id": "openai:gpt-4",
"llm_parameters": {"model": "gpt-4", "temperature": 0.7}
}
aios_item = {
"llm_type": "inference",
"llm_block_id": "aios:llm-prod",
"llm_parameters": {"grpc_url": "dns:///llm.aio.svc.cluster.local:9000", "timeout_s": 20}
}
openai_backend = build_llm_from_model_item(openai_item)
aios_backend = build_llm_from_model_item(aios_item)
llm_manager.create_custom_llm_block(
id="openai-gpt4",
description="OpenAI GPT-4",
tags=["openai", "chat"],
input_protocols=["text/plain"],
output_protocols=["text/plain"],
block_inference_object=openai_backend
)
llm_manager.create_custom_llm_block(
id="aios-llm-prod",
description="AIOS Production LLM",
tags=["aios", "inference"],
input_protocols=["text/plain"],
output_protocols=["text/plain"],
block_inference_object=aios_backend
)
# Run inference
resp = llm_manager.run_ai_inference(
name="openai-gpt4",
input_data={"prompts": [{"role": "user", "content": "Explain quantum computing"}]},
session_id="sess-001",
frame_ptr=b""
)
Example: OpenAI Backend
openai_backend = OpenAIInferenceAPI(
api_key="YOUR_OPENAI_KEY",
model="gpt-4",
temperature=0.7
)
result = openai_backend.run_inference({
"prompts": [{"role": "user", "content": "Write a haiku about the ocean"}]
})
Example: AIOS gRPC Backend
aios_backend = AIOSInferenceAPI(
grpc_url="dns:///llm.aio.svc.cluster.local:9000",
timeout_s=15
)
result = aios_backend.run_inference({
"prompts": [{"role": "user", "content": "Summarize this text..."}]
})
Writing a Custom LLM Backend
Implement the abstract interface:
from agents_sdk.llm_inference import LLMInferenceAPI
class MyCustomLLM(LLMInferenceAPI):
def run_inference(self, input_data: dict):
return "My custom response"
def check_execute(self, query: dict):
return True, {"status": "ok"}
def get_current_estimates(self):
return True, {"latency_ms": 5}
Register with KnownLLMs
:
llm_manager.create_custom_llm_block(
id="custom-llm",
description="Custom LLM backend",
tags=["custom"],
input_protocols=["text/plain"],
output_protocols=["text/plain"],
block_inference_object=MyCustomLLM()
)
Best Practices
- Use
llm_selection_query
to dynamically choose deployments (e.g., region-based). - Keep API keys and secrets in
api_key_ref
/ environment variables. - Prefer streaming for long responses where backend supports it.
- Use
estimate()
before execution for cost and feasibility checks. - Cache frequent responses when possible to save tokens and latency.
Quick Subject Spec Example (two models)
{
"subject_id": "agent-qa",
"subject_type": "agent",
"models": [
{
"llm_type": "inference",
"llm_block_id": "openai:gpt-4",
"llm_parameters": { "model": "gpt-4", "temperature": 0.7 }
},
{
"llm_type": "inference",
"llm_block_id": "aios:llm-prod",
"llm_parameters": { "grpc_url": "dns:///llm.aio.svc.cluster.local:9000", "timeout_s": 20 }
}
]
}