Daita Logo

BigQuery Plugin

Google BigQuery data warehouse connection and querying for agents, built on `google-cloud-bigquery` with asyncio executors.

#Installation

bash
pip install 'daita-agents[bigquery]'

#Quick Start

python
from daita import Agent
from daita.plugins import bigquery
 
# Create plugin
bq = bigquery(
    project="my-gcp-project",
    dataset="analytics"
)
 
# Create agent with BigQuery access
agent = Agent(
    name="Warehouse Analyst",
    model="gpt-4o-mini",
    prompt="You are a data analyst. Query BigQuery to answer questions.",
    tools=[bq]
)
 
await agent.start()
result = await agent.run("What were our top 10 products by revenue last quarter?")

#Direct Usage

The plugin can be used directly without agents for programmatic access:

python
from daita.plugins import bigquery
 
async with bigquery(project="my-project", dataset="analytics") as bq:
    results = await bq.query("SELECT * FROM users LIMIT 10")
    tables = await bq.tables()
    schema = await bq.describe("users")

#Connection Parameters

python
bigquery(
    project: Optional[str] = None,
    dataset: Optional[str] = None,
    credentials_path: Optional[str] = None,
    location: Optional[str] = None,
    timeout: int = 300
)

#Parameters

  • project (str): GCP project ID. Required. Falls back to BIGQUERY_PROJECT or GOOGLE_CLOUD_PROJECT env vars
  • dataset (str): Default dataset for unqualified table references. Falls back to BIGQUERY_DATASET
  • credentials_path (str): Path to service account JSON key file. Falls back to GOOGLE_APPLICATION_CREDENTIALS. If omitted, Application Default Credentials are used
  • location (str): BigQuery location (default: "US"). Falls back to BIGQUERY_LOCATION
  • timeout (int): Query timeout in seconds (default: 300)

#Environment Variables

VariableMaps to
BIGQUERY_PROJECT / GOOGLE_CLOUD_PROJECTproject
BIGQUERY_DATASETdataset
GOOGLE_APPLICATION_CREDENTIALScredentials_path
BIGQUERY_LOCATIONlocation

#Authentication

bash
gcloud auth application-default login
python
bq = bigquery(project="my-project", dataset="analytics")

#Service Account Key

python
bq = bigquery(
    project="my-project",
    dataset="analytics",
    credentials_path="/path/to/service-account.json"
)

#Core Methods

#query()

Execute a SELECT query and return rows as a list of dicts:

python
results = await bq.query("SELECT name, email FROM users WHERE active = true LIMIT 100")
# [{"name": "Alice", "email": "alice@example.com"}, ...]

Parameterized queries use %s placeholders (automatically converted to BigQuery's @p0, @p1 syntax):

python
results = await bq.query(
    "SELECT * FROM events WHERE event_type = %s AND created_at > %s",
    params=["purchase", "2026-01-01"]
)

#execute()

Execute DML/DDL statements and return the number of affected rows:

python
affected = await bq.execute(
    "DELETE FROM logs WHERE created_at < %s",
    params=["2025-01-01"]
)
print(f"Deleted {affected} rows")

#tables()

List all tables in a dataset:

python
table_list = await bq.tables()                      # uses default dataset
table_list = await bq.tables(dataset="other_ds")    # specific dataset

#datasets()

List all datasets in the project:

python
ds_list = await bq.datasets()

#describe()

Get column schema for a table:

python
columns = await bq.describe("users")
# [{"column_name": "id", "data_type": "INT64", "is_nullable": "NO"}, ...]

#count_rows() / sample_rows()

python
count = await bq.count_rows("events")
count = await bq.count_rows("events", filter="event_type = 'purchase'")
 
sample = await bq.sample_rows("events", n=10)

#Available Tools

When used with an agent, BigQuery exposes these LLM-callable tools:

ToolDescriptionKey Parameters
bigquery_queryRun a SELECT querysql, params, focus
bigquery_inspectList tables and column schemasdataset, tables
bigquery_countCount rows (optionally filtered)table, filter
bigquery_sampleRandom sample of rowstable, n
bigquery_list_datasetsList all datasets
bigquery_executeRun DML/DDL (write mode only)sql, params

bigquery_execute is only available when the plugin is not in read-only mode.

#Tool Usage Example

python
from daita import Agent
from daita.plugins import bigquery
 
bq = bigquery(project="analytics-prod", dataset="warehouse")
 
agent = Agent(
    name="Data Analyst",
    prompt="You are a BigQuery analyst. Help users explore and query the data warehouse.",
    tools=[bq]
)
 
await agent.start()
 
result = await agent.run("""
List all datasets, then inspect the tables in the 'sales' dataset.
How many orders were placed last month?
""")
 
await agent.stop()

#Read-Only Mode

Restrict the agent to SELECT queries only:

python
bq = bigquery(project="prod-project", dataset="analytics", read_only=True)

In read-only mode, bigquery_execute is not exposed as a tool.

#Table Name Resolution

Tables are automatically qualified with project and dataset:

python
# These are equivalent when project="my-project" and dataset="analytics":
await bq.query("SELECT * FROM users LIMIT 5")
await bq.query("SELECT * FROM analytics.users LIMIT 5")
await bq.query("SELECT * FROM my-project.analytics.users LIMIT 5")

#Context Manager Usage

python
from daita.plugins import bigquery
 
async with bigquery(project="my-project", dataset="analytics") as bq:
    results = await bq.query("SELECT COUNT(*) as cnt FROM events")
    print(f"Total events: {results[0]['cnt']}")
# Automatically disconnected

#Error Handling

python
from daita.plugins import bigquery
 
try:
    async with bigquery(project="my-project", dataset="analytics") as bq:
        results = await bq.query("SELECT * FROM users LIMIT 10")
except ImportError:
    print("Install BigQuery support: pip install 'daita-agents[bigquery]'")
except ValueError as e:
    print(f"Configuration error: {e}")

#Best Practices

Authentication:

  • Use Application Default Credentials in development
  • Use service accounts with minimal permissions in production
  • Never commit credential files to source control

Performance:

  • Always include LIMIT in exploratory queries — the tool auto-appends LIMIT 50 if omitted
  • Use partitioned and clustered tables for large datasets
  • Set appropriate timeout for long-running analytical queries

Cost Management:

  • Use read_only=True in production to prevent accidental DML
  • Prefer count_rows() over SELECT COUNT(*) for simple counts
  • Use bigquery_inspect to understand schema before writing complex queries

#Troubleshooting

IssueSolution
google-cloud-bigquery not installedpip install 'daita-agents[bigquery]'
BigQuery project is requiredSet project parameter or BIGQUERY_PROJECT env var
Dataset is requiredSet dataset parameter or BIGQUERY_DATASET env var
Permission deniedCheck IAM roles — agent needs BigQuery Data Viewer at minimum
Query timeoutIncrease timeout parameter or optimize the query

#Next Steps