Error Handling

The Daita error handling system provides intelligent, context-aware error management with automatic retry logic and comprehensive exception classification. It's designed to maximize system reliability while providing clear debugging information and graceful failure modes.

#Error Handling Philosophy

Daita uses intelligent error classification where every exception carries "retry hints" that guide automatic recovery behavior. This enables context-aware error recovery without manual intervention.

python
from daita.core.exceptions import TransientError, PermanentError, RetryableError
 
# Transient errors - retry immediately
raise TransientError("Rate limit exceeded")
 
# Retryable errors - retry with exponential backoff
raise RetryableError("Database temporarily unavailable")
 
# Permanent errors - don't retry, fix the issue
raise PermanentError("Invalid API key format")

#Exception Hierarchy

All Daita exceptions inherit from DaitaError and include retry hints and contextual information.

#Base Exception - DaitaError

python
from daita.core.exceptions import DaitaError
 
try:
    result = await some_operation()
except DaitaError as e:
    if e.is_transient():
        print("Will retry immediately")
    elif e.is_retryable():
        print("Will retry with backoff")
    elif e.is_permanent():
        print("Manual intervention required")
    print(f"Context: {e.context}")

#Component-Specific Exceptions

ExceptionWhen to UseRetry Hint
AgentErrorAgent operation failuresVaries by cause
LLMErrorLLM provider errorsUsually retryable
PluginErrorPlugin/database errorsUsually retryable
ConfigErrorConfiguration issuesAlways permanent
WorkflowErrorWorkflow failuresVaries by cause

#Retry-Specific Exception Classes

#TransientError - Immediate Retry

Temporary issues that resolve quickly. Retried with minimal delay.

ExceptionUse CaseAdditional Fields
RateLimitErrorAPI rate limitingretry_after
TimeoutErrorNetwork timeoutstimeout_duration
ConnectionErrorConnection failureshost, port
ServiceUnavailableErrorService downtimeservice_name
python
from daita.core.exceptions import RateLimitError, TimeoutError
 
# Rate limiting
try:
    response = await api_client.get("/data")
except RateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after}s")
 
# Timeouts
try:
    result = await slow_operation()
except TimeoutError as e:
    print(f"Timed out after {e.timeout_duration}s")

#RetryableError - Exponential Backoff

Issues that may resolve with time. Retried with exponential backoff.

ExceptionUse CaseAdditional Fields
ResourceBusyErrorResource contentionresource_name
DataInconsistencyErrorTemporary inconsistencydata_source
ProcessingQueueFullErrorQueue overloadqueue_name

#PermanentError - No Retry

Fundamental issues requiring manual intervention. Not retried.

ExceptionUse CaseAdditional Fields
AuthenticationErrorInvalid credentialsprovider
PermissionErrorAccess deniedresource, action
ValidationErrorInvalid datafield, value
NotFoundErrorMissing resourceresource_type, resource_id

#Automatic Retry Logic

Agents automatically retry failed operations based on error classification. See Overview for detailed retry configuration.

#Basic Configuration

python
from daita import Agent
from daita.config import RetryPolicy
 
# Simple: Enable retry with defaults
agent = Agent(
    name="Resilient Agent",
    enable_retry=True  # 3 retries, exponential backoff
)
 
# Advanced: Custom retry policy
agent = Agent(
    name="Custom Agent",
    enable_retry=True,
    retry_policy=RetryPolicy(
        max_retries=5,
        initial_delay=2.0
    )
)
 
# Automatic retry handling
result = await agent.run("Process this data")

How it works:

  • Transient errors → Retry immediately with minimal delay
  • Retryable errors → Retry with exponential backoff (1s, 2s, 4s, 8s...)
  • Permanent errors → No retry, fail immediately
  • Random jitter prevents thundering herd

#Error Handling Patterns

#Basic Error Handling

python
from daita import Agent
from daita.core.exceptions import AgentError, LLMError, ValidationError
 
agent = Agent(name="MyAgent", enable_retry=True)
 
try:
    result = await agent.run("Process this data")
    print(f"Success: {result}")
 
except ValidationError as e:
    # Permanent errors - fix input and retry
    print(f"Invalid input: {e}")
    # Fix data and try again
 
except LLMError as e:
    # LLM provider errors - usually transient
    if e.is_permanent():
        print(f"API key issue: {e}")
    else:
        print(f"Temporary LLM error: {e}")
        # Automatic retry handles this
 
except AgentError as e:
    # Agent errors with context
    print(f"Agent failed: {e}")
    print(f"Context: {e.context}")
 
except Exception as e:
    # Unexpected errors
    print(f"Unexpected error: {e}")
    raise

#Graceful Degradation

python
async def get_recommendations(user_id):
    """Get recommendations with fallback strategies."""
 
    try:
        # Primary: AI recommendations
        return await ai_agent.run(f"Recommend for user {user_id}")
    except LLMError:
        # Fallback: Rule-based recommendations
        return get_rule_based_recommendations(user_id)
    except Exception:
        # Final fallback: Popular items
        return get_popular_items()

#Error Monitoring & Debugging

All errors are automatically traced through Daita's built-in tracing system. See Automatic Tracing for details.

#Error Information

python
from daita.core.exceptions import DaitaError
 
try:
    result = await agent.run("Process data")
except DaitaError as e:
    # All Daita exceptions include:
    print(f"Error: {e}")
    print(f"Retry hint: {e.retry_hint}")
    print(f"Context: {e.context}")
 
    # Component-specific fields
    if hasattr(e, 'agent_id'):
        print(f"Agent: {e.agent_id}")
    if hasattr(e, 'provider'):
        print(f"Provider: {e.provider}")

#Logging Best Practices

  • Use structured logging with error context
  • Log permanent errors as errors, transient as warnings
  • Include retry attempt information
  • Track error rates and trends
  • Set up alerts for high error rates

#Best Practices

Exception Types:

  • Use specific exception types (ValidationError, RateLimitError, etc.) not generic Exception
  • Raise TransientError for temporary issues (network, rate limits)
  • Raise RetryableError for resource contention or queue issues
  • Raise PermanentError for auth, validation, or configuration errors
  • Include descriptive error messages

Error Context:

  • Always include relevant context in exceptions
  • Add operation name, user ID, resource IDs to context
  • Use create_contextual_error() to wrap standard exceptions
  • Include timestamps for debugging

Graceful Degradation:

  • Implement fallback strategies for critical features
  • Try AI → Rule-based → Static fallbacks
  • Return cached data when services are unavailable
  • Don't fail completely when non-critical features break

Monitoring & Alerting:

  • Track error rates and patterns
  • Alert on high authentication failure rates
  • Monitor rate limiting frequency
  • Track timeout rates for performance issues
  • Use structured logging for analysis

Testing:

  • Test that transient errors are retried
  • Verify permanent errors don't retry
  • Mock failures to test error paths
  • Test fallback strategies work correctly
  • Verify error context is preserved

#Next Steps