Agent LLM Interface — Client Developer Documentation

Introduction

The Agent LLM Interface module provides a unified framework for managing, selecting, and executing Large Language Models (LLMs) and optimizer blocks from multiple backends (OpenAI, AIOS gRPC, OrgLLM, custom). It is declared in the Subject Specification via a Model Item and executed at runtime through the KnownLLMs registry and the LLMInferenceAPI interface.

It provides:

Pluggable backends (OpenAI / AIOS gRPC / OrgLLM / Custom)
Single & batched prompt execution
Per-backend parameters (model, temperature, max tokens, etc.)
Consistent API for local and remote execution
Planner integration for dynamic model selection

A typical workflow is:

Declare a ModelItem with llm_type="inference" or llm_type="optimizer" and backend-specific parameters.
Build an inference backend (OpenAI/AIOS/OrgLLM/Custom) from the model item.
Register the backend in KnownLLMs.
Run inference or estimation calls via a consistent API.

Registering LLMs in Subject Specification

LLMs are registered using a ModelItem entry.

Data Class Reference

from dataclasses import dataclass, field
from typing import Dict, Any

@dataclass
class ModelItem:
    llm_type: str
    llm_block_id: str
    llm_selection_query: Dict[str, Any] = field(default_factory=dict)
    llm_parameters: Dict[str, Any] = field(default_factory=dict)

For LLM inference: set llm_type = "inference" (or "optimizer" for optimizers). llm_block_id identifies the model/provider target (e.g., "openai:gpt-4", "aios:llm-prod", "orgllm:planner-qa"). llm_selection_query can be used by your orchestrator to select a specific deployment/tenant/region. llm_parameters carries provider-specific settings (e.g., model, temperature, max tokens, streaming).

Minimal Required Configuration

{
  "llm_type": "inference",
  "llm_block_id": "openai:gpt-4",
  "llm_parameters": {
    "model": "gpt-4",
    "temperature": 0.7
  }
}

Field Descriptions

Field	Location	Type	Description
llm_type	root	`str`	`"inference"` for LLMs, `"optimizer"` for optimizer blocks.
llm_block_id	root	`str`	Logical route to a backend target (e.g., `openai:`, `aios:`, `orgllm:*`).
llm_selection_query	root	`dict`	Optional selector (region, tier, tenant) used by your runtime to pick a concrete deployment.
llm_parameters	root	`dict`	Backend-specific configuration: `model`, `temperature`, `max_tokens`, `streaming`, etc.

Example in a Subject Specification

{
  "subject_id": "agent-planner",
  "subject_type": "agent",
  "models": [
    {
      "llm_type": "inference",
      "llm_block_id": "openai:gpt-4",
      "llm_selection_query": { "region": "us-east-1" },
      "llm_parameters": {
        "model": "gpt-4",
        "temperature": 0.7
      }
    },
    {
      "llm_type": "inference",
      "llm_block_id": "aios:llm-prod",
      "llm_parameters": {
        "grpc_url": "dns:///llm.aio.svc.cluster.local:9000",
        "timeout_s": 20
      }
    },
    {
      "llm_type": "optimizer",
      "llm_block_id": "orgllm:planner-qa",
      "llm_selection_query": { "tenant": "org-123" },
      "llm_parameters": {
        "grpc_url": "dns:///orgllm.qa.svc.cluster.local:9000",
        "temperature": 0
      }
    }
  ]
}

Import and Setup

from agents_sdk.llm import (
    KnownLLMs,
    KnownLLMOptimizers,
    BlocksQueryManager,
    LLMInferenceAPI,
    OpenAIInferenceAPI,
    AIOSInferenceAPI,
    OrgLLMInferenceAPI
)

Building an LLM Backend from a ModelItem

def build_llm_from_model_item(item: dict) -> LLMInferenceAPI:
    block_id = item["llm_block_id"]
    params = item.get("llm_parameters", {})

    if block_id.startswith("openai:"):
        return OpenAIInferenceAPI(
            api_key=params.get("api_key") or params.get("api_key_ref"),
            model=params.get("model", "gpt-4"),
            temperature=params.get("temperature", 0.7)
        )
    if block_id.startswith("aios:"):
        return AIOSInferenceAPI(
            grpc_url=params["grpc_url"],
            timeout_s=params.get("timeout_s", 15)
        )
    if block_id.startswith("orgllm:"):
        return OrgLLMInferenceAPI(
            grpc_url=params["grpc_url"],
            temperature=params.get("temperature", 0.0)
        )
    raise NotImplementedError(f"Unknown LLM backend for {block_id}")

KnownLLMs Usage

llm_manager = KnownLLMs(
    base_url="http://registry.local",
    adhoc_inference_url="http://inference.local"
)

openai_item = {
  "llm_type": "inference",
  "llm_block_id": "openai:gpt-4",
  "llm_parameters": {"model": "gpt-4", "temperature": 0.7}
}
aios_item = {
  "llm_type": "inference",
  "llm_block_id": "aios:llm-prod",
  "llm_parameters": {"grpc_url": "dns:///llm.aio.svc.cluster.local:9000", "timeout_s": 20}
}

openai_backend = build_llm_from_model_item(openai_item)
aios_backend = build_llm_from_model_item(aios_item)

llm_manager.create_custom_llm_block(
    id="openai-gpt4",
    description="OpenAI GPT-4",
    tags=["openai", "chat"],
    input_protocols=["text/plain"],
    output_protocols=["text/plain"],
    block_inference_object=openai_backend
)

llm_manager.create_custom_llm_block(
    id="aios-llm-prod",
    description="AIOS Production LLM",
    tags=["aios", "inference"],
    input_protocols=["text/plain"],
    output_protocols=["text/plain"],
    block_inference_object=aios_backend
)

# Run inference
resp = llm_manager.run_ai_inference(
    name="openai-gpt4",
    input_data={"prompts": [{"role": "user", "content": "Explain quantum computing"}]},
    session_id="sess-001",
    frame_ptr=b""
)

Example: OpenAI Backend

openai_backend = OpenAIInferenceAPI(
    api_key="YOUR_OPENAI_KEY",
    model="gpt-4",
    temperature=0.7
)

result = openai_backend.run_inference({
    "prompts": [{"role": "user", "content": "Write a haiku about the ocean"}]
})

Example: AIOS gRPC Backend

aios_backend = AIOSInferenceAPI(
    grpc_url="dns:///llm.aio.svc.cluster.local:9000",
    timeout_s=15
)

result = aios_backend.run_inference({
    "prompts": [{"role": "user", "content": "Summarize this text..."}]
})

Writing a Custom LLM Backend

Implement the abstract interface:

from agents_sdk.llm_inference import LLMInferenceAPI

class MyCustomLLM(LLMInferenceAPI):
    def run_inference(self, input_data: dict):
        return "My custom response"

    def check_execute(self, query: dict):
        return True, {"status": "ok"}

    def get_current_estimates(self):
        return True, {"latency_ms": 5}

Register with KnownLLMs:

llm_manager.create_custom_llm_block(
    id="custom-llm",
    description="Custom LLM backend",
    tags=["custom"],
    input_protocols=["text/plain"],
    output_protocols=["text/plain"],
    block_inference_object=MyCustomLLM()
)

Best Practices

Use llm_selection_query to dynamically choose deployments (e.g., region-based).
Keep API keys and secrets in api_key_ref / environment variables.
Prefer streaming for long responses where backend supports it.
Use estimate() before execution for cost and feasibility checks.
Cache frequent responses when possible to save tokens and latency.

Quick Subject Spec Example (two models)

{
  "subject_id": "agent-qa",
  "subject_type": "agent",
  "models": [
    {
      "llm_type": "inference",
      "llm_block_id": "openai:gpt-4",
      "llm_parameters": { "model": "gpt-4", "temperature": 0.7 }
    },
    {
      "llm_type": "inference",
      "llm_block_id": "aios:llm-prod",
      "llm_parameters": { "grpc_url": "dns:///llm.aio.svc.cluster.local:9000", "timeout_s": 20 }
    }
  ]
}