WebSearch Plugin

AI-optimized web search with automatic answer extraction, news filtering, and URL content fetching. Built on Tavily Search API.

#Installation

bash
pip install tavily-python

Or install with the websearch extra:

bash
pip install daita-agents[websearch]

#Quick Start

python
from daita import Agent
from daita.plugins import websearch
 
# Create plugin (uses TAVILY_API_KEY from environment)
search = websearch()
 
# Agent uses search tools autonomously
agent = Agent(
    name="Research Assistant",
    prompt="You are a research assistant. Use web search to find accurate, current information.",
    tools=[search],
    model="gpt-4o-mini"
)
 
await agent.start()
result = await agent.run("What are the latest developments in quantum computing?")
await agent.stop()

#Getting Started with Tavily

  1. Sign up: Visit tavily.com (free account)
  2. Get API key: Copy your API key from the dashboard
  3. Set environment variable: export TAVILY_API_KEY=tvly-xxxxx
  4. Start using: No additional configuration needed

Free Tier: 1,000 searches/month Paid Tier: $0.002/search ($2 per 1,000 searches)

#Direct Usage

The plugin can be used directly without agents for programmatic search operations. For comprehensive Tavily API documentation, see the official Tavily docs. The main value of this plugin is agent integration - enabling LLMs to autonomously search the web, find recent news, and fetch content.

#Connection Parameters

python
websearch(
    api_key: Optional[str] = None,
    max_results: int = 5,
    search_depth: str = "basic",
    include_answer: bool = True,
    include_raw_content: bool = False,
    max_page_length: int = 10000,
    **kwargs
)

#Parameters

  • api_key (str): Tavily API key (or from TAVILY_API_KEY env var) - required
  • max_results (int): Default number of search results (default: 5)
  • search_depth (str): Search depth - "basic" or "advanced" (default: "basic")
  • include_answer (bool): Include AI-extracted direct answer (default: True)
  • include_raw_content (bool): Include full HTML content (default: False)
  • max_page_length (int): Maximum characters for page fetching (default: 10,000)
  • ****kwargs**: Additional configuration parameters

#Connection Methods

python
# From environment variable (recommended)
import os
async with websearch() as search:
    results = await search.search("Python asyncio best practices")
 
# With explicit API key
async with websearch(api_key="tvly-xxxxx") as search:
    results = await search.search("machine learning", max_results=10)
 
# With custom configuration
async with websearch(
    api_key=os.getenv("TAVILY_API_KEY"),
    max_results=10,
    search_depth="advanced",  # More comprehensive search
    include_answer=True
) as search:
    results = await search.search("quantum computing")

#Using with Agents

#Direct Search Operations (Scripts)

For scripts that don't need agent capabilities:

python
from daita.plugins import websearch
import os
 
async with websearch(api_key=os.getenv("TAVILY_API_KEY")) as search:
    # Web search with AI answer
    results = await search.search("What is quantum computing?", max_results=5)
    print(f"Answer: {results['answer']}")
 
    for result in results['results']:
        print(f"\n{result['title']}")
        print(f"URL: {result['url']}")
        print(f"Relevance: {result['score']:.2f}")
        print(f"Content: {result['content'][:200]}...")
 
    # News search
    news = await search.search_news("AI developments", days=7, max_results=3)
    for article in news['results']:
        print(f"\n{article['title']}")
        print(f"Published: {article['published_date']}")
 
    # Fetch page content
    page = await search.fetch_page("https://example.com/article")
    print(f"Content: {page['content'][:500]}...")

WebSearch plugin exposes search operations as tools that agents can use autonomously:

python
from daita import Agent
from daita.plugins import websearch
import os
 
# Create WebSearch plugin
search = websearch(api_key=os.getenv("TAVILY_API_KEY"))
 
# Pass plugin to agent - agent can now use search tools autonomously
agent = Agent(
    name="Research Assistant",
    prompt="""You are a research assistant with web search capabilities.
    When answering questions:
    1. Use search_web to find relevant information
    2. Use search_news for recent developments
    3. Analyze search results by relevance score (higher is better)
    4. Provide well-sourced answers with URLs
    """,
    model="gpt-4o-mini",
    tools=[search]
)
 
await agent.start()
 
# Agent autonomously uses search tools to answer questions
result = await agent.run("Research the current state of renewable energy technology")
 
# The agent will autonomously:
# 1. Use search_web tool to find general information
# 2. Use search_news tool to find recent developments
# 3. Analyze results and AI-extracted answers
# 4. Provide a comprehensive answer with sources
 
await agent.stop()

#Available Tools

The WebSearch plugin exposes these tools to LLM agents:

ToolDescriptionParameters
search_webAI-optimized web search with answer extractionquery (required), max_results (optional), include_answer (optional)
search_newsRecent news search with date filteringquery (required), days (optional, default: 7), max_results (optional)
fetch_pageFetch and extract clean text content from a URLurl (required)

Tool Categories: search Tool Source: plugin API Key: Configured at plugin initialization or from TAVILY_API_KEY environment variable

#Tool Usage Example

python
from daita import Agent
from daita.plugins import websearch
import os
 
# Setup WebSearch with tool integration
search = websearch(api_key=os.getenv("TAVILY_API_KEY"), max_results=8)
 
agent = Agent(
    name="Comprehensive Researcher",
    prompt="""You are an expert research assistant. When researching a topic:
    1. First use search_web to get general information and overview
    2. Then use search_news to find recent developments (if relevant)
    3. Analyze search results by relevance score
    4. Synthesize information from multiple sources
    5. Provide a structured answer with key points and sources
    """,
    model="gpt-4o-mini",
    tools=[search]
)
 
await agent.start()
 
# Natural language command - agent uses tools autonomously
result = await agent.run("""
Research quantum computing applications in cryptography.
Provide a comprehensive overview including:
1. Current state of the technology
2. Recent developments
3. Key applications
Include sources with URLs.
""")
 
# Agent orchestrates search tool calls autonomously
print(result)
await agent.stop()

#Direct Tool Calling

You can also call search tools directly without LLM autonomy:

python
from daita import Agent
from daita.plugins import websearch
 
agent = Agent(
    name="searcher",
    tools=[websearch()],
    model="gpt-4o-mini"
)
 
# Call search_web tool directly
result = await agent.call_tool("search_web", {
    "query": "Python async programming",
    "max_results": 5,
    "include_answer": True
})
 
print(f"AI Answer: {result['answer']}")
print(f"Found {result['count']} results")
 
# Call search_news tool directly
news = await agent.call_tool("search_news", {
    "query": "artificial intelligence",
    "days": 7,
    "max_results": 3
})
 
print(f"Found {news['count']} recent articles")

#Search Methods

#search(query, max_results=None, include_answer=None)

General web search with AI-optimized results.

Parameters:

  • query (str): Search query
  • max_results (int, optional): Number of results (uses plugin default if not specified)
  • include_answer (bool, optional): Include AI-extracted answer (uses plugin default if not specified)

Returns:

python
{
    "success": True,
    "query": "search query",
    "answer": "AI-extracted direct answer to the query",
    "results": [
        {
            "title": "Result title",
            "url": "https://...",
            "content": "LLM-optimized content snippet",
            "score": 0.95,  # Relevance score (0-1)
            "published_date": "2025-01-15"  # If available
        }
    ],
    "count": 5
}

#search_news(query, days=7, max_results=None)

Search for recent news articles with date filtering.

Parameters:

  • query (str): News search query
  • days (int): How many days back to search (default: 7)
  • max_results (int, optional): Number of results (uses plugin default if not specified)

Returns:

python
{
    "success": True,
    "query": "news query",
    "results": [
        {
            "title": "Article title",
            "url": "https://...",
            "content": "Article content snippet",
            "score": 0.90,
            "published_date": "2025-12-20"  # Guaranteed for news
        }
    ],
    "count": 3
}

#fetch_page(url)

Fetch and extract clean text content from a URL.

Parameters:

  • url (str): URL to fetch

Returns:

python
{
    "success": True,
    "url": "https://...",
    "content": "Extracted plain text content",
    "length": 1234,
    "truncated": False
}

#Error Handling

python
from daita.plugins import websearch
from daita.core.exceptions import (
    AuthenticationError,
    RateLimitError,
    TransientError,
    PermanentError
)
 
try:
    async with websearch(api_key="tvly-xxxxx") as search:
        results = await search.search("test query")
except ValueError as e:
    # Configuration errors (missing API key, invalid search_depth)
    if "API key is required" in str(e):
        print("Error: Set TAVILY_API_KEY environment variable")
        print("Get a free key at https://tavily.com")
    elif "search_depth must be" in str(e):
        print("Error: search_depth must be 'basic' or 'advanced'")
except AuthenticationError as e:
    # Invalid API key (401)
    print(f"Authentication failed: {e}")
    print("Check your Tavily API key at https://tavily.com")
except RateLimitError as e:
    # Rate limit exceeded (429)
    print(f"Rate limit exceeded: {e}")
    print(f"Wait {e.retry_after} seconds or upgrade plan")
except TransientError as e:
    # Temporary issues (network, service unavailable)
    print(f"Temporary error: {e}")
    print("Retrying may succeed")
except PermanentError as e:
    # Bad request (400), forbidden (403)
    print(f"Permanent error: {e}")
    print("Check query format or API access")
except RuntimeError as e:
    # Dependency not installed
    if "tavily-python not installed" in str(e):
        print("Install Tavily SDK: pip install tavily-python")

#Common Error Messages

ErrorCauseSolution
Tavily API key is requiredMissing API keySet TAVILY_API_KEY environment variable or pass api_key parameter
Invalid Tavily API keyWrong or expired API keyCheck API key at tavily.com dashboard
Rate limit exceededToo many requestsWait for rate limit reset or upgrade to paid tier
tavily-python not installedMissing dependencyRun pip install tavily-python
search_depth must be 'basic' or 'advanced'Invalid configurationUse "basic" or "advanced" for search_depth

#Best Practices

#1. API Key Security

python
# Good: Use environment variables
import os
search = websearch(api_key=os.getenv("TAVILY_API_KEY"))
 
# Bad: Hardcode API keys
search = websearch(api_key="tvly-xxxxx")  # Don't do this!

#2. Optimize Search Results

python
# For quick answers: fewer results with AI answer
search = websearch(max_results=3, include_answer=True)
 
# For comprehensive research: more results
search = websearch(max_results=10, include_answer=True)
 
# For advanced search (costs more): use advanced depth
search = websearch(search_depth="advanced", max_results=10)
python
# Use search_news for time-sensitive queries
news = await search.search_news("Python 3.12 release", days=30)
 
# Use search_web for general information
info = await search.search("Python programming best practices")

#4. Relevance Score Usage

python
results = await search.search("machine learning", max_results=10)
 
# Filter by relevance score
high_quality = [r for r in results['results'] if r['score'] > 0.8]
 
# Sort by relevance (already sorted by default)
for result in results['results']:
    print(f"{result['title']} (score: {result['score']:.2f})")

#5. Context Manager Pattern

python
# Good: Use context manager for automatic cleanup
async with websearch() as search:
    results = await search.search("query")
    # Cleanup happens automatically
 
# Also good: Manual connect/disconnect
search = websearch()
await search.connect()
try:
    results = await search.search("query")
finally:
    await search.disconnect()

#Cost Optimization

Free Tier: 1,000 searches/month

Tips to reduce costs:

  • Use include_answer=True to get direct answers without additional processing
  • Set appropriate max_results (default: 5) - don't request more than needed
  • Use search_depth="basic" for most queries (advanced costs more)
  • Cache results when possible to avoid duplicate searches
  • Use fetch_page only as a fallback when Tavily doesn't have the page

Pricing:

  • Basic search: $0.002/search ($2 per 1,000)
  • Advanced search: Higher cost for deeper crawling

#Why Tavily?

Tavily is the industry-standard search API for AI agents, used by:

  • LangChain
  • CrewAI
  • AutoGen
  • LlamaIndex
  • Haystack

Key advantages:

  • Built for AI: Results pre-formatted for LLM consumption
  • AI-extracted answers: Automatic answer extraction from search results
  • Relevance scoring: Built-in scores help agents prioritize results
  • LLM-optimized content: Clean, summarized content snippets
  • Production-ready: Official API with SLA guarantees (no web scraping)
  • Cost-effective: Generous free tier + affordable pricing

#Advanced Usage

#Multiple Sequential Searches

python
async with websearch() as search:
    queries = [
        "quantum computing basics",
        "quantum computing applications",
        "quantum computing recent breakthroughs"
    ]
 
    all_results = []
    for query in queries:
        results = await search.search(query, max_results=3)
        all_results.append({
            "query": query,
            "answer": results['answer'],
            "count": results['count']
        })
 
    # Process all results
    for item in all_results:
        print(f"{item['query']}: {item['answer'][:100]}...")

#Comprehensive Research Workflow

python
from daita import Agent
from daita.plugins import websearch
 
agent = Agent(
    name="research_agent",
    tools=[websearch(max_results=8)],
    model="gpt-4o-mini",
    prompt="""You are a comprehensive research assistant. For each topic:
    1. Use search_web to understand the topic fundamentals
    2. Use search_news to find latest developments
    3. Analyze by relevance scores
    4. Synthesize into structured overview with citations
    """
)
 
await agent.start()
result = await agent.run("""
Research: 'Renewable energy technology trends'
Provide:
- Current state overview
- Recent breakthroughs (last 30 days)
- Key applications
- Future outlook
Include sources with URLs
""")
await agent.stop()

#Resources

  • Tavily Website: tavily.com
  • Tavily Documentation: docs.tavily.com
  • API Dashboard: app.tavily.com (manage API keys, view usage)
  • Pricing: tavily.com/pricing
  • Example Code: See examples/websearch_example.py in the DAITA repository
  • Test Suite: See examples/websearch_real_test.py for comprehensive testing examples

#Support

For plugin issues, see DAITA GitHub Issues For Tavily API issues, see Tavily Support