BigQuery Plugin

#Installation

bash

pip install 'daita-agents[bigquery]'

#Quick Start

python

from daita import Agent
from daita.plugins import bigquery
 
# Create plugin
bq = bigquery(
    project="my-gcp-project",
    dataset="analytics"
)
 
# Create agent with BigQuery access
agent = Agent(
    name="Warehouse Analyst",
    model="gpt-4o-mini",
    prompt="You are a data analyst. Query BigQuery to answer questions.",
    tools=[bq]
)
 
await agent.start()
result = await agent.run("What were our top 10 products by revenue last quarter?")

#Direct Usage

The plugin can be used directly without agents for programmatic access:

python

from daita.plugins import bigquery
 
async with bigquery(project="my-project", dataset="analytics") as bq:
    results = await bq.query("SELECT * FROM users LIMIT 10")
    tables = await bq.tables()
    schema = await bq.describe("users")

#Connection Parameters

python

bigquery(
    project: Optional[str] = None,
    dataset: Optional[str] = None,
    credentials_path: Optional[str] = None,
    location: Optional[str] = None,
    timeout: int = 300
)

#Parameters

project (str): GCP project ID. Required. Falls back to BIGQUERY_PROJECT or GOOGLE_CLOUD_PROJECT env vars
dataset (str): Default dataset for unqualified table references. Falls back to BIGQUERY_DATASET
credentials_path (str): Path to service account JSON key file. Falls back to GOOGLE_APPLICATION_CREDENTIALS. If omitted, Application Default Credentials are used
location (str): BigQuery location (default: "US"). Falls back to BIGQUERY_LOCATION
timeout (int): Query timeout in seconds (default: 300)

#Environment Variables

Variable	Maps to
`BIGQUERY_PROJECT` / `GOOGLE_CLOUD_PROJECT`	`project`
`BIGQUERY_DATASET`	`dataset`
`GOOGLE_APPLICATION_CREDENTIALS`	`credentials_path`
`BIGQUERY_LOCATION`	`location`

#Authentication

#Application Default Credentials (recommended)

bash

gcloud auth application-default login

python

bq = bigquery(project="my-project", dataset="analytics")

#Service Account Key

python

bq = bigquery(
    project="my-project",
    dataset="analytics",
    credentials_path="/path/to/service-account.json"
)

#Core Methods

#query()

Execute a SELECT query and return rows as a list of dicts:

python

results = await bq.query("SELECT name, email FROM users WHERE active = true LIMIT 100")
# [{"name": "Alice", "email": "alice@example.com"}, ...]

Parameterized queries use %s placeholders (automatically converted to BigQuery's @p0, @p1 syntax):

python

results = await bq.query(
    "SELECT * FROM events WHERE event_type = %s AND created_at > %s",
    params=["purchase", "2026-01-01"]
)

#execute()

Execute DML/DDL statements and return the number of affected rows:

python

affected = await bq.execute(
    "DELETE FROM logs WHERE created_at < %s",
    params=["2025-01-01"]
)
print(f"Deleted {affected} rows")

#tables()

List all tables in a dataset:

python

table_list = await bq.tables()                      # uses default dataset
table_list = await bq.tables(dataset="other_ds")    # specific dataset

#datasets()

List all datasets in the project:

python

ds_list = await bq.datasets()

#describe()

Get column schema for a table:

python

columns = await bq.describe("users")
# [{"column_name": "id", "data_type": "INT64", "is_nullable": "NO"}, ...]

#count_rows() / sample_rows()

python

count = await bq.count_rows("events")
count = await bq.count_rows("events", filter="event_type = 'purchase'")
 
sample = await bq.sample_rows("events", n=10)

#Available Tools

When used with an agent, BigQuery exposes these LLM-callable tools:

Tool	Description	Key Parameters
bigquery_query	Run a SELECT query	`sql`, `params`, `focus`
bigquery_inspect	List tables and column schemas	`dataset`, `tables`
bigquery_count	Count rows (optionally filtered)	`table`, `filter`
bigquery_sample	Random sample of rows	`table`, `n`
bigquery_list_datasets	List all datasets	—
bigquery_execute	Run DML/DDL (write mode only)	`sql`, `params`

bigquery_execute is only available when the plugin is not in read-only mode.

#Tool Usage Example

python

from daita import Agent
from daita.plugins import bigquery
 
bq = bigquery(project="analytics-prod", dataset="warehouse")
 
agent = Agent(
    name="Data Analyst",
    prompt="You are a BigQuery analyst. Help users explore and query the data warehouse.",
    tools=[bq]
)
 
await agent.start()
 
result = await agent.run("""
List all datasets, then inspect the tables in the 'sales' dataset.
How many orders were placed last month?
""")
 
await agent.stop()

#Read-Only Mode

Restrict the agent to SELECT queries only:

python

bq = bigquery(project="prod-project", dataset="analytics", read_only=True)

In read-only mode, bigquery_execute is not exposed as a tool.

#Table Name Resolution

Tables are automatically qualified with project and dataset:

python

# These are equivalent when project="my-project" and dataset="analytics":
await bq.query("SELECT * FROM users LIMIT 5")
await bq.query("SELECT * FROM analytics.users LIMIT 5")
await bq.query("SELECT * FROM my-project.analytics.users LIMIT 5")

#Context Manager Usage

python

from daita.plugins import bigquery
 
async with bigquery(project="my-project", dataset="analytics") as bq:
    results = await bq.query("SELECT COUNT(*) as cnt FROM events")
    print(f"Total events: {results[0]['cnt']}")
# Automatically disconnected

#Error Handling

python

from daita.plugins import bigquery
 
try:
    async with bigquery(project="my-project", dataset="analytics") as bq:
        results = await bq.query("SELECT * FROM users LIMIT 10")
except ImportError:
    print("Install BigQuery support: pip install 'daita-agents[bigquery]'")
except ValueError as e:
    print(f"Configuration error: {e}")

#Best Practices

Authentication:

Use Application Default Credentials in development
Use service accounts with minimal permissions in production
Never commit credential files to source control

Performance:

Always include LIMIT in exploratory queries — the tool auto-appends LIMIT 50 if omitted
Use partitioned and clustered tables for large datasets
Set appropriate timeout for long-running analytical queries

Cost Management:

Use read_only=True in production to prevent accidental DML
Prefer count_rows() over SELECT COUNT(*) for simple counts
Use bigquery_inspect to understand schema before writing complex queries

#Troubleshooting

Issue	Solution
`google-cloud-bigquery not installed`	`pip install 'daita-agents[bigquery]'`
`BigQuery project is required`	Set `project` parameter or `BIGQUERY_PROJECT` env var
`Dataset is required`	Set `dataset` parameter or `BIGQUERY_DATASET` env var
Permission denied	Check IAM roles — agent needs `BigQuery Data Viewer` at minimum
Query timeout	Increase `timeout` parameter or optimize the query

#Next Steps

Snowflake Plugin — Another data warehouse plugin
PostgreSQL Plugin — Relational database operations
Plugin Overview — All available plugins