Skip to content

Agent LLM Interface — Client Developer Documentation

Introduction

The Agent LLM Interface module provides a unified framework for managing, selecting, and executing Large Language Models (LLMs) and optimizer blocks from multiple backends (OpenAI, AIOS gRPC, OrgLLM, custom). It is declared in the Subject Specification via a Model Item and executed at runtime through the KnownLLMs registry and the LLMInferenceAPI interface.

It provides:

  • Pluggable backends (OpenAI / AIOS gRPC / OrgLLM / Custom)
  • Single & batched prompt execution
  • Per-backend parameters (model, temperature, max tokens, etc.)
  • Consistent API for local and remote execution
  • Planner integration for dynamic model selection

A typical workflow is:

  1. Declare a ModelItem with llm_type="inference" or llm_type="optimizer" and backend-specific parameters.
  2. Build an inference backend (OpenAI/AIOS/OrgLLM/Custom) from the model item.
  3. Register the backend in KnownLLMs.
  4. Run inference or estimation calls via a consistent API.

Registering LLMs in Subject Specification

LLMs are registered using a ModelItem entry.

Data Class Reference

from dataclasses import dataclass, field
from typing import Dict, Any

@dataclass
class ModelItem:
    llm_type: str
    llm_block_id: str
    llm_selection_query: Dict[str, Any] = field(default_factory=dict)
    llm_parameters: Dict[str, Any] = field(default_factory=dict)

For LLM inference: set llm_type = "inference" (or "optimizer" for optimizers). llm_block_id identifies the model/provider target (e.g., "openai:gpt-4", "aios:llm-prod", "orgllm:planner-qa"). llm_selection_query can be used by your orchestrator to select a specific deployment/tenant/region. llm_parameters carries provider-specific settings (e.g., model, temperature, max tokens, streaming).


Minimal Required Configuration

{
  "llm_type": "inference",
  "llm_block_id": "openai:gpt-4",
  "llm_parameters": {
    "model": "gpt-4",
    "temperature": 0.7
  }
}

Field Descriptions

Field Location Type Description
llm_type root str "inference" for LLMs, "optimizer" for optimizer blocks.
llm_block_id root str Logical route to a backend target (e.g., openai:*, aios:*, orgllm:*).
llm_selection_query root dict Optional selector (region, tier, tenant) used by your runtime to pick a concrete deployment.
llm_parameters root dict Backend-specific configuration: model, temperature, max_tokens, streaming, etc.

Example in a Subject Specification

{
  "subject_id": "agent-planner",
  "subject_type": "agent",
  "models": [
    {
      "llm_type": "inference",
      "llm_block_id": "openai:gpt-4",
      "llm_selection_query": { "region": "us-east-1" },
      "llm_parameters": {
        "model": "gpt-4",
        "temperature": 0.7
      }
    },
    {
      "llm_type": "inference",
      "llm_block_id": "aios:llm-prod",
      "llm_parameters": {
        "grpc_url": "dns:///llm.aio.svc.cluster.local:9000",
        "timeout_s": 20
      }
    },
    {
      "llm_type": "optimizer",
      "llm_block_id": "orgllm:planner-qa",
      "llm_selection_query": { "tenant": "org-123" },
      "llm_parameters": {
        "grpc_url": "dns:///orgllm.qa.svc.cluster.local:9000",
        "temperature": 0
      }
    }
  ]
}

Import and Setup

from agents_sdk.llm import (
    KnownLLMs,
    KnownLLMOptimizers,
    BlocksQueryManager,
    LLMInferenceAPI,
    OpenAIInferenceAPI,
    AIOSInferenceAPI,
    OrgLLMInferenceAPI
)

Building an LLM Backend from a ModelItem

def build_llm_from_model_item(item: dict) -> LLMInferenceAPI:
    block_id = item["llm_block_id"]
    params = item.get("llm_parameters", {})

    if block_id.startswith("openai:"):
        return OpenAIInferenceAPI(
            api_key=params.get("api_key") or params.get("api_key_ref"),
            model=params.get("model", "gpt-4"),
            temperature=params.get("temperature", 0.7)
        )
    if block_id.startswith("aios:"):
        return AIOSInferenceAPI(
            grpc_url=params["grpc_url"],
            timeout_s=params.get("timeout_s", 15)
        )
    if block_id.startswith("orgllm:"):
        return OrgLLMInferenceAPI(
            grpc_url=params["grpc_url"],
            temperature=params.get("temperature", 0.0)
        )
    raise NotImplementedError(f"Unknown LLM backend for {block_id}")

KnownLLMs Usage

llm_manager = KnownLLMs(
    base_url="http://registry.local",
    adhoc_inference_url="http://inference.local"
)

openai_item = {
  "llm_type": "inference",
  "llm_block_id": "openai:gpt-4",
  "llm_parameters": {"model": "gpt-4", "temperature": 0.7}
}
aios_item = {
  "llm_type": "inference",
  "llm_block_id": "aios:llm-prod",
  "llm_parameters": {"grpc_url": "dns:///llm.aio.svc.cluster.local:9000", "timeout_s": 20}
}

openai_backend = build_llm_from_model_item(openai_item)
aios_backend = build_llm_from_model_item(aios_item)

llm_manager.create_custom_llm_block(
    id="openai-gpt4",
    description="OpenAI GPT-4",
    tags=["openai", "chat"],
    input_protocols=["text/plain"],
    output_protocols=["text/plain"],
    block_inference_object=openai_backend
)

llm_manager.create_custom_llm_block(
    id="aios-llm-prod",
    description="AIOS Production LLM",
    tags=["aios", "inference"],
    input_protocols=["text/plain"],
    output_protocols=["text/plain"],
    block_inference_object=aios_backend
)

# Run inference
resp = llm_manager.run_ai_inference(
    name="openai-gpt4",
    input_data={"prompts": [{"role": "user", "content": "Explain quantum computing"}]},
    session_id="sess-001",
    frame_ptr=b""
)

Example: OpenAI Backend

openai_backend = OpenAIInferenceAPI(
    api_key="YOUR_OPENAI_KEY",
    model="gpt-4",
    temperature=0.7
)

result = openai_backend.run_inference({
    "prompts": [{"role": "user", "content": "Write a haiku about the ocean"}]
})

Example: AIOS gRPC Backend

aios_backend = AIOSInferenceAPI(
    grpc_url="dns:///llm.aio.svc.cluster.local:9000",
    timeout_s=15
)

result = aios_backend.run_inference({
    "prompts": [{"role": "user", "content": "Summarize this text..."}]
})

Writing a Custom LLM Backend

Implement the abstract interface:

from agents_sdk.llm_inference import LLMInferenceAPI

class MyCustomLLM(LLMInferenceAPI):
    def run_inference(self, input_data: dict):
        return "My custom response"

    def check_execute(self, query: dict):
        return True, {"status": "ok"}

    def get_current_estimates(self):
        return True, {"latency_ms": 5}

Register with KnownLLMs:

llm_manager.create_custom_llm_block(
    id="custom-llm",
    description="Custom LLM backend",
    tags=["custom"],
    input_protocols=["text/plain"],
    output_protocols=["text/plain"],
    block_inference_object=MyCustomLLM()
)

Best Practices

  • Use llm_selection_query to dynamically choose deployments (e.g., region-based).
  • Keep API keys and secrets in api_key_ref / environment variables.
  • Prefer streaming for long responses where backend supports it.
  • Use estimate() before execution for cost and feasibility checks.
  • Cache frequent responses when possible to save tokens and latency.

Quick Subject Spec Example (two models)

{
  "subject_id": "agent-qa",
  "subject_type": "agent",
  "models": [
    {
      "llm_type": "inference",
      "llm_block_id": "openai:gpt-4",
      "llm_parameters": { "model": "gpt-4", "temperature": 0.7 }
    },
    {
      "llm_type": "inference",
      "llm_block_id": "aios:llm-prod",
      "llm_parameters": { "grpc_url": "dns:///llm.aio.svc.cluster.local:9000", "timeout_s": 20 }
    }
  ]
}