LLM Providers
The Daita framework provides a unified interface for working with multiple Large Language Model (LLM) providers. The system uses a factory pattern that allows easy switching between providers while maintaining a consistent interface for agents.
#Overview
Daita supports six LLM providers out of the box with full streaming support:
- OpenAI - GPT-5.4 mini, GPT-5.4, GPT-5.5, and other OpenAI models
- Anthropic - Claude family models (Haiku, Sonnet, Opus)
- xAI Grok - Grok 4 and vision models
- Google Gemini - Gemini 2.5 Flash/Lite and Gemini Pro models
- Ollama - Local models (Llama, Mistral, Gemma, CodeStral, Phi, etc.)
- Mock Provider - Testing and development without API calls
All providers support real-time streaming for both text generation and tool calling, enabling transparent agent execution with live progress updates.
#Environment Variables
Set API keys for the providers you'll use:
| Provider | Environment Variable | Key Format |
|---|---|---|
| OpenAI | OPENAI_API_KEY | sk-... |
| Anthropic | ANTHROPIC_API_KEY | sk-ant-... |
GOOGLE_API_KEY or GEMINI_API_KEY | AIza... | |
| xAI | XAI_API_KEY or GROK_API_KEY | xai-... |
| Ollama | (none required) | — |
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."#Quick Start
#Direct Instantiation (Recommended)
Import and instantiate providers directly:
from daita.llm import OpenAIProvider, AnthropicProvider, GrokProvider
# OpenAI
llm = OpenAIProvider(model="gpt-5.4-mini")
response = await llm.generate("Hello, world!")
# Anthropic Claude
llm = AnthropicProvider(model="claude-haiku-4-5")
response = await llm.generate("Analyze this data...")
# xAI Grok
llm = GrokProvider(model="grok-4.20")
response = await llm.generate("What's the latest news?")#Factory Pattern
Use the factory when provider is determined at runtime:
from daita.llm import create_llm_provider
provider_name = config.get("llm_provider") # From config
llm = create_llm_provider(provider_name, "gpt-5.4-mini")
response = await llm.generate("Hello, world!")#Streaming
All providers support real-time streaming for text and tool calling:
from daita.llm import OpenAIProvider
llm = OpenAIProvider(model="gpt-5.4-mini")
# Stream text generation
async for chunk in llm.generate("Write a story", stream=True):
if chunk.type == "text":
print(chunk.content, end="", flush=True)
elif chunk.type == "tool_call_complete":
print(f"\nTool: {chunk.tool_name}({chunk.tool_args})")Chunk Types:
"text"- Text content (field:content)"tool_call_complete"- Tool call (fields:tool_name,tool_args,tool_call_id)
All providers return the same chunk format for consistent handling across different LLMs.
#Factory Function
The create_llm_provider() factory function is useful when you need to dynamically select providers at runtime (e.g., from configuration files). For most cases, direct instantiation is simpler and more Pythonic:
from daita.llm import create_llm_provider
llm = create_llm_provider(
provider="openai", # Provider name
model="gpt-5.4-mini", # Model identifier
api_key="sk-...", # API key (optional if set in environment)
agent_id="my_agent", # For token tracking (optional)
temperature=0.7, # Model parameters (optional)
max_tokens=1000 # Additional provider-specific options
)#Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
provider | str | Yes | Provider name: 'openai', 'anthropic', 'grok', 'gemini', 'ollama', or 'mock' |
model | str | Yes | Model identifier specific to the provider |
api_key | str | No | API key (uses environment variables if not provided) |
agent_id | str | No | Agent ID for token usage tracking |
**kwargs | dict | No | Additional provider-specific parameters |
#Registry
List available providers:
from daita.llm import list_available_providers
providers = list_available_providers()
print(providers) # ['openai', 'anthropic', 'grok', 'gemini', 'ollama', 'mock']#OpenAI Provider
The OpenAI provider supports OpenAI chat models including GPT-5.4 mini, GPT-5.4, GPT-5.5, and legacy GPT-4 variants.
#Configuration
from daita.llm import OpenAIProvider
# Basic OpenAI configuration
llm = OpenAIProvider(
model="gpt-5.4-mini",
api_key="sk-your-openai-key"
)
# Advanced configuration with custom parameters
llm = OpenAIProvider(
model="gpt-5.4-mini",
api_key="sk-your-openai-key",
temperature=0.7,
max_completion_tokens=1000,
reasoning_effort="medium",
service_tier="auto",
parallel_tool_calls=True,
frequency_penalty=0.1,
presence_penalty=0.1,
timeout=60
)max_tokens is still accepted as a convenience alias. For current OpenAI models, Daita sends it as max_completion_tokens by default. Set use_legacy_max_tokens=True when targeting older OpenAI-compatible endpoints that still expect max_tokens.
#Advanced Features
# Custom parameters with conversation
messages = [
{"role": "system", "content": "You are an expert code reviewer."},
{"role": "user", "content": "Analyze this code for bugs: def foo(): return x"}
]
response = await llm.generate(messages, temperature=0.3, max_tokens=2000)
# Tool calling (function calling)
from daita.core.tools import tool
@tool
async def get_weather(location: str) -> dict:
"""Get weather for a location."""
return {"temp": 72, "condition": "sunny"}
# Use tools with generate()
response = await llm.generate("What's the weather like?", tools=[get_weather])#Anthropic Provider
The Anthropic provider supports Claude family models with their unique capabilities and safety features.
#Configuration
from daita.llm import AnthropicProvider
# Basic Anthropic configuration
llm = AnthropicProvider(
model="claude-haiku-4-5",
api_key="sk-ant-your-anthropic-key"
)
# Advanced configuration
llm = AnthropicProvider(
model="claude-sonnet-4-5",
api_key="sk-ant-your-anthropic-key",
temperature=0.5,
max_tokens=2000,
timeout=90
)#Claude-Specific Features
# Long-form content generation
response = await llm.generate(
prompt="Write a comprehensive analysis of...",
max_tokens=4000,
temperature=0.7
)
# Document analysis with large context
response = await llm.generate(
prompt=f"Analyze this document: {large_document_text}",
max_tokens=1000
)#xAI Grok Provider
The Grok provider connects to xAI's Grok models, which are optimized for real-time information and conversational AI.
#Configuration
from daita.llm import GrokProvider
# Basic Grok configuration
llm = GrokProvider(
model="grok-4.20",
api_key="xai-your-api-key"
)
# Configuration with custom base URL
llm = GrokProvider(
model="grok-vision-beta",
api_key="xai-your-api-key",
base_url="https://api.x.ai/v1",
timeout=60
)#Grok-Specific Features
# Real-time information queries
response = await llm.generate(
prompt="What's happening in tech news today?",
temperature=0.8
)
# Vision capabilities (grok-vision-beta)
llm_vision = create_llm_provider("grok", "grok-vision-beta")
response = await llm_vision.generate_with_image(
prompt="Describe this image",
image_path="./screenshot.png"
)#Google Gemini Provider
The Gemini provider supports Google's latest generative AI models with multimodal capabilities.
#Configuration
from daita.llm import GeminiProvider
# Basic Gemini configuration
llm = GeminiProvider(
model="gemini-2.5-flash-lite",
api_key="AIza-your-google-api-key"
)
# Advanced configuration with safety settings
llm = GeminiProvider(
model="gemini-2.5-flash",
api_key="AIza-your-google-api-key",
temperature=0.9,
top_k=40,
response_mime_type="application/json",
safety_settings={
"HARM_CATEGORY_HARASSMENT": "BLOCK_MEDIUM_AND_ABOVE",
"HARM_CATEGORY_HATE_SPEECH": "BLOCK_MEDIUM_AND_ABOVE"
},
generation_config={
"candidate_count": 1,
"max_output_tokens": 2048
}
)Gemini provider calls also accept stop_sequences, response_schema, and thinking_config. These are forwarded into GenerateContentConfig alongside standard options such as temperature, top_p, and max_tokens.
#Gemini-Specific Features
# Large context processing
response = await llm.generate(
prompt=f"Summarize this entire codebase: {massive_code_text}",
max_tokens=1000
)
# Multimodal capabilities
response = await llm.generate_with_media(
prompt="Explain what's happening in this video",
media_path="./demo_video.mp4"
)
# Safety-filtered generation
response = await llm.generate(
prompt="Generate content about...",
safety_settings={
"HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_LOW_AND_ABOVE"
}
)#Ollama Provider
The Ollama provider connects to a locally running Ollama server via its OpenAI-compatible API, letting you run agents against any model available through ollama pull — Llama 3.1, Mistral, Gemma 2, CodeStral, Phi 3, and more.
No API key is required. Ollama must be running locally (or at a reachable URL).
#Configuration
from daita.llm import OllamaProvider
# Basic — uses localhost:11434 by default
llm = OllamaProvider(model="llama3.1")
# Custom server URL
llm = OllamaProvider(
model="mistral",
base_url="http://gpu-server:11434/v1",
timeout=120
)The server URL can also be set via the OLLAMA_BASE_URL environment variable.
#Using with Agents
from daita import Agent
agent = Agent(
name="Local Agent",
llm_provider="ollama",
model="llama3.1",
)
await agent.start()
result = await agent.run("Analyze this data...")
await agent.stop()#Supported Models
Any model available via ollama pull works. Common choices:
| Model | Use Case | Command |
|---|---|---|
llama3.1 | General purpose | ollama pull llama3.1 |
mistral | Fast, balanced | ollama pull mistral |
codestral | Code generation | ollama pull codestral |
gemma2 | Lightweight, efficient | ollama pull gemma2 |
phi3 | Small but capable | ollama pull phi3 |
#Error Handling
The Ollama provider produces clear diagnostics for common issues:
- Connection refused — "Cannot connect to Ollama at ... Is Ollama running? Start it with:
ollama serve" - Cloud environment — "Ollama is a local-only LLM provider and cannot run in Daita Cloud. Use a cloud provider instead."
#Limitations
- Local only — Ollama cannot run in Daita Cloud deployments. Use a cloud provider (OpenAI, Anthropic, Gemini, Grok) for hosted agents.
- Tool calling — Supported, but quality varies by model. Llama 3.1 and Mistral have the best tool-calling support.
- Streaming — Fully supported via the OpenAI-compatible streaming API.
#Mock Provider
The Mock provider is designed for testing and development without making actual API calls or incurring costs.
#Configuration
# Basic mock configuration
llm = create_llm_provider(
provider="mock",
model="test-model",
agent_id="test_agent"
)
# Mock with custom responses
llm = create_llm_provider(
provider="mock",
model="gpt-4-mock",
responses=["Hello! This is a mock response.", "Another mock response."],
delay=0.5 # Simulate API latency
)#Features
- No API calls - Returns predefined responses
- Latency simulation - Configurable delays to simulate real API behavior
- Token tracking - Simulates token usage for testing
- Error simulation - Can simulate API failures for error handling tests
#Mock-Specific Configuration
# Detailed mock setup
llm = create_llm_provider(
provider="mock",
model="claude-mock",
responses=[
"This is the first mock response.",
"This is the second mock response.",
"This is the third mock response."
],
delay=1.0, # 1 second delay
cycle_responses=True, # Cycle through responses
simulate_tokens=True, # Track mock token usage
error_rate=0.1 # 10% chance of simulated errors
)
# Use in tests
response = await llm.generate("Any prompt")
print(response) # Returns one of the mock responses#Multi-Provider Usage
You can use multiple providers in the same application for different use cases:
from daita.llm import create_llm_provider
# Different providers for different tasks
openai_llm = create_llm_provider("openai", "gpt-5.4-mini", agent_id="analyzer")
anthropic_llm = create_llm_provider("anthropic", "claude-haiku-4-5", agent_id="writer")
gemini_llm = create_llm_provider("gemini", "gemini-2.5-flash-lite", agent_id="summarizer")
# Use appropriate provider for each task
analysis = await openai_llm.generate("Analyze this data: ...")
content = await anthropic_llm.generate("Write an article about: ...")
summary = await gemini_llm.generate("Summarize this document: ...")#Using with Agents
Agents use providers automatically when you specify llm_provider:
from daita import Agent
# Agent uses specified provider
agent = Agent(
name="Analyst",
llm_provider="anthropic",
model="claude-haiku-4-5"
)
await agent.start()
result = await agent.run("Analyze this data")See Agent documentation for complete agent usage.
#Error Handling
All providers implement consistent error handling:
from daita.core.exceptions import LLMError
try:
llm = create_llm_provider("openai", "gpt-5.4-mini")
response = await llm.generate("Your prompt here")
except LLMError as e:
print(f"LLM error: {e}")
# Handle provider-specific errors
except Exception as e:
print(f"Unexpected error: {e}")#Common Error Types
- Authentication errors - Invalid API keys
- Rate limiting - Too many requests
- Model errors - Invalid model names
- Network errors - Connection issues
- Token limit errors - Prompt too long
#Token Tracking
Token usage is automatically tracked when using agents:
from daita import Agent
# Create agent with LLM provider
agent = Agent(
name="My Agent",
llm_provider="openai",
model="gpt-5.4-mini"
)
# Use the agent (token usage tracked automatically)
await agent.run("analyze this text", data={"text": "Hello, world!"})
# Check token usage
usage = agent.get_token_usage()
print(f"Total tokens: {usage['total_tokens']}")
print(f"Prompt tokens: {usage['prompt_tokens']}")
print(f"Completion tokens: {usage['completion_tokens']}")
print(f"Estimated cost: ${usage['estimated_cost']:.4f}")#Custom Providers
You can register custom LLM providers for specialized use cases:
from daita.llm import register_llm_provider, BaseLLMProvider
class CustomProvider(BaseLLMProvider):
"""Custom LLM provider implementation."""
async def generate(self, prompt: str, **kwargs) -> str:
# Your custom implementation
return "Custom response"
# Register the provider
register_llm_provider("custom", CustomProvider)
# Use the custom provider
llm = create_llm_provider("custom", "custom-model")
response = await llm.generate("Test prompt")#Best Practices
API Keys:
- Store keys in environment variables, never hardcode
- Use different keys for development and production
- Rotate keys regularly
Model Selection:
- Fast tasks: Gemini Flash, GPT-3.5-turbo
- Balanced: Claude Sonnet, GPT-4o-mini
- Complex reasoning: GPT-4, Claude Opus
Error Handling:
- Always wrap LLM calls in try-except blocks
- Handle rate limiting with exponential backoff
- Consider fallback providers for resilience
Performance:
- Use streaming for better user experience
- Set appropriate
max_tokensto control costs - Monitor token usage with agent tracing
#Next Steps
- Getting Started - Quick start tutorial
- Agent - Using LLMs in agents
- Authentication - API key setup and management
- Error Handling - Robust error management
- Tracing - Monitor LLM usage and costs