Daita Logo

Memory Plugin

Production-ready semantic memory for Daita agents with automatic local/cloud detection and intelligent curation.

#Quick Start

python
from daita import Agent
from daita.plugins import MemoryPlugin
 
# Create plugin (automatically project-scoped and isolated per agent)
memory = MemoryPlugin()
 
# Add to agent - memory is now persistent across runs
agent = Agent(
    name="Research Assistant",
    prompt="You are a research assistant. Use memory to track important findings.",
    tools=[memory]
)
 
await agent.start()
 
# Agent can now remember and recall information autonomously
result = await agent.run("Remember that the user prefers Python over JavaScript")
# Later...
result = await agent.run("What programming language does the user prefer?")
 
await agent.stop()

#Direct Usage

The plugin can be used directly without agents for programmatic memory management. However, the main value is agent integration - enabling LLMs to autonomously store and retrieve context across conversations using semantic search.

#Quick Construction

Use the memory() factory for concise plugin creation:

python
from daita.plugins import memory
 
mem = memory(workspace="my_project", auto_curate="on_stop")

This is equivalent to MemoryPlugin(...) and accepts the same parameters.

#Configuration Parameters

python
MemoryPlugin(
    workspace: Optional[str] = None,
    scope: str = "project",
    auto_curate: str = "on_stop",
    curation_provider: Optional[str] = None,
    curation_model: Optional[str] = None,
    curation_api_key: Optional[str] = None,
    embedding_provider: str = "openai",
    embedding_model: str = "text-embedding-3-small",
    embedder: Optional[BaseEmbeddingProvider] = None,
    enable_reranking: bool = False,
    enable_fact_extraction: bool = False,
    enable_working_memory: bool = False,
    enable_reinforcement: bool = False,
    enable_memory_graph: bool = False,
    max_chunks: int = 2000,
    default_ttl_days: Optional[int] = None,
    tier: str = "basic",
    memory_tools: Optional[List[str]] = None,
    dedup_threshold: float = 0.95,
)

#Parameters

  • workspace (str): Workspace name for memory isolation. Default: auto-generated from agent name for stable persistence across runs
  • scope (str): Memory scope - "project" (default, stored in .daita/memory/) or "global" (stored in ~/.daita/memory/)
  • auto_curate (str): Curation trigger mode - "on_stop" (default) or "manual"
  • curation_provider (str): LLM provider for curation ("openai", "anthropic", etc.). Default: "openai"
  • curation_model (str): LLM model for curation. Default: "gpt-4o-mini"
  • curation_api_key (str): API key for curation LLM. Default: uses global settings
  • embedding_provider (str): Provider name for semantic embeddings. Default: "openai". Ignored if embedder is provided
  • embedding_model (str): Model for embeddings. Default: "text-embedding-3-small". Ignored if embedder is provided
  • embedder (BaseEmbeddingProvider): Pre-constructed embedding provider instance. Takes precedence over embedding_provider/embedding_model. See Embedding Providers for available providers
  • enable_reranking (bool): Re-score recall results with an LLM for higher precision. Default: False
  • enable_fact_extraction (bool): Extract structured facts from memories at ingestion time for richer recall. Default: False
  • enable_working_memory (bool): Enable session-scoped scratchpad. Adds scratch and think tools. Default: False
  • enable_reinforcement (bool): Enable outcome-based learning. Adds reinforce tool for recording whether recalled memories led to good or bad outcomes. Default: False
  • enable_memory_graph (bool): Enable knowledge graph over memories. Adds traverse_memory and query_facts tools. Default: False
  • max_chunks (int): Maximum stored chunks before LRU-style eviction (default: 2000). Keeps memory bounded for long-running agents
  • default_ttl_days (int): Default time-to-live in days for new memories. None means no expiry (default). Can be overridden per-memory via the ttl_days parameter on remember()
  • tier (str): Tool tier controlling which tools are exposed. "basic" (default), "analysis", or "full". See Tool Tiers
  • memory_tools (list): Explicit list of tool names to expose. Overrides tier when set
  • dedup_threshold (float): Cosine similarity threshold for deduplication (default: 0.95). Lower values are more aggressive at deduplication

#Memory Scopes & Workspaces

Scope controls where memory is stored:

python
# Project-scoped (default) - memory stays with this project
memory = MemoryPlugin(scope="project")
# Location: .daita/memory/workspaces/{workspace}/
 
# Global - memory accessible across all projects
memory = MemoryPlugin(scope="global")
# Location: ~/.daita/memory/workspaces/{workspace}/

Workspace controls memory isolation:

python
# Isolated (default) - each agent has its own memory
agent = Agent("Researcher", tools=[MemoryPlugin()])
# Workspace: "researcher" (auto-generated from agent name)
 
# Shared - multiple agents share the same memory
shared_memory = MemoryPlugin(workspace="research_team")
agent1 = Agent("Researcher", tools=[shared_memory])
agent2 = Agent("Analyst", tools=[shared_memory])
# Both agents access workspace: "research_team"

Cloud Deployment:

  • Automatically detects cloud environment
  • Uses AWS storage for persistence across serverless invocations

#Using with Agents

Memory plugin exposes semantic memory operations as tools that agents can use autonomously:

python
from daita import Agent
from daita.plugins import MemoryPlugin
 
# Create memory plugin with custom configuration
memory = MemoryPlugin(
    auto_curate="on_stop"   # Curate when agent stops
)
 
# Pass plugin to agent - agent can now use memory tools autonomously
agent = Agent(
    name="Personal Assistant",
    prompt="""You are a personal assistant. Use your memory to:
    - Remember user preferences and important facts
    - Recall relevant context from past conversations
    - Build a knowledge base over time""",
    llm_provider="openai",
    model="gpt-4",
    tools=[memory]
)
 
await agent.start()
 
# Agent autonomously uses memory tools
result = await agent.run("Remember: I'm allergic to peanuts and prefer dark mode")
# Later conversation...
result = await agent.run("What are my dietary restrictions?")
# Agent uses recall() to find relevant memories
 
await agent.stop()

#Available Tools

The Memory plugin exposes these tools to LLM agents. Which tools are available depends on the tier parameter and feature flags.

#Core Tools (all tiers)

ToolDescriptionParameters
rememberStore information in long-term memorycontent (str or list of dicts), importance (float: 0.5), category (optional), ttl_days (optional), promote_key (optional)
recallSearch memories semanticallyquery, limit (5), score_threshold (0.6), min_importance, max_importance, category, since, before
read_memoryRead complete memory filefile (default: "MEMORY.md", or "today")
list_memoriesList all memory filesNone

#Analysis & Full Tier Tools

ToolDescriptionParametersRequires
query_factsQuery structured entity-relation-value triplesentity (optional)enable_fact_extraction=True
traverse_memoryWalk entity connections across memoriesentity (required), depth (int: 2)enable_memory_graph=True
reinforceRecord positive/negative outcome for a memorychunk_ids (list), outcome ("positive"/"negative")enable_reinforcement=True
list_by_categoryEnumerate all memories in a categorycategory (required), min_importance, limit

#Working Memory Tools

ToolDescriptionParametersRequires
scratchWrite to session-scoped scratchpadcontent (str), key (optional)enable_working_memory=True
thinkSearch working memory by keywordquery (str), limit (int: 5)enable_working_memory=True

Tool Categories: memory Tool Source: plugin

#Tool Usage Example

python
from daita import Agent
from daita.plugins import MemoryPlugin
 
# Setup memory with custom curation
memory = MemoryPlugin(
    workspace="project_alpha",      # Shared workspace
    auto_curate="on_stop"           # Curate when agent stops
)
 
agent = Agent(
    name="Project Manager",
    prompt="You are a project manager. Track decisions, tasks, and key information.",
    llm_provider="openai",
    model="gpt-4",
    tools=[memory]
)
 
await agent.start()
 
# Agent uses memory tools autonomously
result = await agent.run("""
Store project information:
- Client prefers weekly status updates on Mondays
- Budget approved: $50,000
- Deadline: March 15, 2024
- Tech stack: Python, FastAPI, PostgreSQL
""")
 
# Later, retrieve context
result = await agent.run("What's our project deadline and budget?")
# Agent uses recall() to find relevant information
 
# Check specific memory file
result = await agent.run("Show me the long-term memory file")
# Agent uses read_memory() to display full content
 
await agent.stop()

#Batch Storage

New in 0.14.0. remember() accepts a list of dicts for batch ingestion. This uses a single embedding API call, making it significantly more efficient than storing items one at a time:

python
await agent.run("""
Remember the following facts:
- Q1 revenue was $4.2M (importance: 0.8, category: financial)
- New VP of Eng starts March 15 (category: people)
- Sprint velocity averaged 42 points (importance: 0.6)
""")

The agent translates this to a batch call internally:

python
await remember([
    {"content": "Q1 revenue was $4.2M", "importance": 0.8, "category": "financial"},
    {"content": "New VP of Eng starts March 15", "category": "people"},
    {"content": "Sprint velocity averaged 42 points", "importance": 0.6},
])

Each item in the batch can specify its own importance and category. Items without these fields use auto-classification.

#Memory TTL

New in 0.14.0. Memories can be set to expire automatically:

python
# Plugin-wide default: all memories expire after 90 days
memory = MemoryPlugin(default_ttl_days=90)
 
# Per-memory override
await remember("Temporary API key: sk-abc123", ttl_days=7, importance=0.9)
  • Expired memories are pruned at session start/stop
  • Memories with no TTL (None) never expire
  • Per-memory ttl_days overrides default_ttl_days

#Time-Aware Recall

New in 0.14.0. recall() supports since and before parameters for time-bounded searches. These accept ISO datetimes or relative shorthand:

python
# Memories from the last 24 hours
result = await recall("deployment issues", since="24h")
 
# Memories from the last 7 days
result = await recall("financial data", since="7d")
 
# Memories before a specific date
result = await recall("old decisions", before="2026-01-01T00:00:00")

Supported shorthand: "24h", "7d", "30d" (hours or days).

#Auto-Classification

New in 0.14.0. When category or importance are not explicitly set, the Memory plugin automatically classifies memories using lightweight heuristics (no LLM call required). This ensures consistent categorization without additional API cost.

#Tool Tiers

New in 0.15.0. The tier parameter controls which tools are exposed to the agent, letting you match complexity to the use case:

TierTools Included
"basic"remember, recall, read_memory, list_memories
"analysis"Basic + query_facts, traverse_memory, reinforce, list_by_category
"full"All tools
python
# Simple agent — just needs remember/recall
memory = MemoryPlugin(tier="basic")
 
# Research agent — needs graph traversal and fact queries
memory = MemoryPlugin(
    tier="analysis",
    enable_fact_extraction=True,
    enable_memory_graph=True,
    enable_reinforcement=True,
)
 
# Full control — pick exactly which tools to expose
memory = MemoryPlugin(memory_tools=["remember", "recall", "scratch", "think"])

The memory_tools parameter overrides tier when set, giving fine-grained control.

#Working Memory

New in 0.15.0. Working memory is a session-scoped scratchpad — in-memory only, no disk, no embeddings, no API calls. It is designed for agents that need to take notes during a task and optionally promote important findings to long-term memory.

python
memory = MemoryPlugin(enable_working_memory=True, tier="full")
 
agent = Agent(
    name="Researcher",
    prompt="Take scratch notes as you work, then promote key findings to memory.",
    tools=[memory]
)

The agent can use two working memory tools:

  • scratch(content, key=None) — Write to the scratchpad. Returns the assigned key. If key is omitted, an auto-incrementing key (scratch_1, scratch_2, ...) is used.
  • think(query, limit=5) — Search the scratchpad by keyword. Returns matching items sorted by relevance.

Working memory auto-evicts when the agent stops. To promote a scratch item to long-term memory, use remember(promote_key="scratch_1").

#Memory Graph

New in 0.15.0; updated in 0.17.0. The memory graph builds a lightweight knowledge graph over stored memories, connecting entities via relationships. This enables traversal queries that pure cosine similarity would miss — for example, "what are all the infrastructure constraints for Project Orion?"

python
memory = MemoryPlugin(
    enable_memory_graph=True,
    enable_fact_extraction=True,   # recommended for richer graphs
    tier="analysis",
)

In 0.17.0, memory graph domain types moved out of the core graph vocabulary and into the memory plugin. Most applications do not need to change anything, but code that imports memory graph node or edge enums directly should use the memory-owned models:

python
from daita.plugins.memory.graph_models import (
    MemoryEdgeType,
    MemoryGraphEdge,
    MemoryGraphNode,
    MemoryNodeType,
)
from daita.plugins.memory.graph_store import GraphBackendMemoryGraphStore

GraphBackendMemoryGraphStore adapts the shared core graph backend for memory-domain records, so memory graphs still use the same persistence, locking, and traversal mechanics as other graph-backed features.

#How It Works

When a memory is stored, the graph layer:

  1. Creates a memory node for the stored chunk
  2. Extracts entity nodes — from LLM-produced facts (if fact extraction is enabled) or zero-LLM keyword heuristics (backtick code references, capitalized phrases, table.column patterns, quoted strings)
  3. Creates MENTIONS edges from memory nodes to entity nodes
  4. Creates RELATED_TO edges between entities that co-occur in the same fact

#Traversal

The traverse_memory tool performs a BFS walk from a named entity, returning connected entities and the memories that mention them:

python
# Agent calls:
# traverse_memory(entity="Project Orion", depth=2)
#
# Returns entities and memories connected within 2 hops

The query_facts tool returns structured entity-relation-value triples extracted from memories:

python
# Agent calls:
# query_facts(entity="PostgreSQL")
#
# Returns facts like: PostgreSQL -> runs_on -> RDS, PostgreSQL -> version -> 15.2

#Entity Quality

The graph includes quality filters to avoid noise:

  • Temporal phrases ("as of 2025", "before Q3") are excluded
  • Currency amounts and bare numbers are excluded
  • Generic nouns ("challenges", "data", "things") are excluded
  • Technical identifiers (snake_case, dot.notation) are preserved as high-value entities

#Reinforcement Learning

New in 0.15.0. Reinforcement learning lets agents record whether recalled memories were helpful, adjusting their effective scores in future recall.

python
memory = MemoryPlugin(enable_reinforcement=True, tier="analysis")

The agent uses the reinforce tool to provide feedback:

python
# After using recall results successfully:
# reinforce(chunk_ids=["abc123", "def456"], outcome="positive")
 
# After recall results were unhelpful:
# reinforce(chunk_ids=["ghi789"], outcome="negative")

Positive reinforcement boosts a memory's score in future recall; negative reinforcement lowers it. This creates a feedback loop where the most useful memories surface more readily over time.

#Content Preprocessing

New in 0.15.0. At ingestion time, content is automatically split into two representations:

  • Storage content — the original text, stored in the database and daily log
  • Index content — a cleaned version used for embedding, deduplication, and fact extraction

The index representation strips structural noise — code blocks, inline code, markdown formatting, bullet prefixes — so the embedding captures the factual signal rather than formatting. This prevents structurally identical but factually different memories (e.g. two table schemas) from appearing as near-duplicates.

No configuration required — preprocessing runs automatically on all remember() calls.

#Custom Embeddings

New in 0.15.0. Inject a custom embedding provider instead of using the default OpenAI embeddings:

python
from daita.plugins import MemoryPlugin
from daita.embeddings import create_embedding_provider
 
# Use local embeddings (no API key needed)
embedder = create_embedding_provider(
    "sentence-transformers",
    model="all-MiniLM-L6-v2"
)
 
memory = MemoryPlugin(embedder=embedder)

The embedder parameter accepts any BaseEmbeddingProvider instance and takes precedence over the embedding_provider/embedding_model string parameters. See Embedding Providers for all available providers.


#Direct Memory Operations (Scripts)

For scripts that need memory operations, use a lightweight agent:

python
import asyncio
from daita import Agent
from daita.plugins import MemoryPlugin
 
async def main():
    memory = MemoryPlugin(workspace="analytics", auto_curate="manual")
 
    agent = Agent(
        name="Memory Manager",
        model="gpt-4o-mini",
        prompt="You are a memory manager. Store and retrieve information as instructed.",
        tools=[memory]
    )
 
    await agent.start()
 
    # Store information
    await agent.run(
        "Remember with importance 0.8 and category 'financial': "
        "Q4 revenue exceeded projections by 15%"
    )
 
    # Search memories
    result = await agent.run("What do you know about revenue projections?")
    print(result)
 
    await agent.stop()
 
asyncio.run(main())

#Advanced Memory Management

#Programmatic Curation

Daita Cloud only. curate() requires a managed cloud deployment. On local installs it raises RuntimeError. Automatic summarisation via regenerate_memory_md() is available locally when auto_curate="on_stop".

python
from daita.plugins import MemoryPlugin
 
memory = MemoryPlugin(auto_curate="manual")
agent = Agent("Analyst", tools=[memory])
await agent.start()
 
# Run agent interactions...
result = await agent.run("Analyze today's data...")
 
# Manually trigger curation (Daita Cloud only)
curation_result = await memory.curate()
print(f"Added {curation_result.facts_added} facts")
print(f"Cost: ${curation_result.cost_usd:.4f}")
 
await agent.stop()

#Importance Scoring

python
# Mark specific memories as important
result = await memory.mark_important(
    query="project deadline",
    importance=0.9,
    source="user_explicit"
)
 
# Pin critical memories (never pruned)
result = await memory.pin(query="client password")
print(f"Pinned {result['updated']} memories")
 
# Remove outdated memories
result = await memory.forget(query="old API credentials")
print(f"Deleted {result['deleted']} memories")

#Runtime Configuration

python
# Update configuration dynamically
memory.configure(auto_curate="manual")   # Switch to manual curation
memory.configure(auto_curate="on_stop")  # Switch back to automatic

#Intelligence Features

#LLM Reranking

When enable_reranking=True, the plugin runs a lightweight LLM scoring pass over the top recall candidates before returning them, reordering results by semantic relevance. This trades a small amount of latency for significantly higher recall precision.

python
from daita import Agent
from daita.plugins import memory
 
# Curator agent provides the LLM for reranking
curator = Agent(name="Curator", model="gpt-4o-mini", prompt="You curate memories.")
 
mem = memory(
    workspace="research",
    enable_reranking=True,
    curator=curator
)
 
agent = Agent(
    name="Research Assistant",
    prompt="Use memory to track findings.",
    tools=[mem]
)

Reranking is most useful when your memory store is large and baseline vector similarity returns many near-equal candidates.

#Fact Extraction

When enable_fact_extraction=True, the plugin parses each stored memory through an LLM at ingestion time and stores structured ExtractedFact records alongside the raw content. Facts capture entities, relationships, dates, and numeric values, enabling more precise temporal and relational recall.

python
mem = memory(
    workspace="project_alpha",
    enable_fact_extraction=True,
    curator=curator
)

Extracted facts are stored in the memory's metadata under extracted_facts and are used automatically during recall(). Enabling this option adds one LLM call per remember() invocation.

#Query Routing

The Memory plugin automatically classifies recall queries into semantic categories (vector search, keyword search, or hybrid) using an internal QueryRouter. No configuration is required — routing happens transparently based on query characteristics.

#Curation System

The Memory Plugin includes intelligent curation that extracts important facts from daily logs and stores them in long-term memory.

Curation Process:

  1. Analyzes daily conversation logs
  2. Extracts key facts, preferences, and decisions using LLM
  3. Assigns importance scores (0.0-1.0) to each fact
  4. Merges similar facts to prevent redundancy
  5. Stores in long-term memory with semantic embeddings

Curation Modes:

python
# Automatic on agent stop (default)
MemoryPlugin(auto_curate="on_stop")
 
# Manual trigger only
MemoryPlugin(auto_curate="manual")

Curation Result:

python
curation_result = await memory.curate()
 
# Access results
print(f"Success: {curation_result.success}")
print(f"Facts extracted: {curation_result.facts_extracted}")
print(f"Facts added: {curation_result.facts_added}")
print(f"Memories updated: {curation_result.memories_updated}")
print(f"Memories pruned: {curation_result.memories_pruned}")
print(f"Tokens used: {curation_result.tokens_used}")
print(f"Cost: ${curation_result.cost_usd:.4f}")

#Best Practices

Memory Organization:

  • Use project scope for project-specific context (default)
  • Use global scope for cross-project knowledge (user preferences, general facts)
  • Create shared workspaces for team collaboration across agents
  • Keep isolated workspaces (default) for independent agent tasks

Performance:

  • Let auto-curation run on agent stop (default) - balances freshness and cost
  • Use auto_curate="manual" for long-running agents where you control timing
  • Set score_threshold in recall() to filter low-relevance results (default: 0.6)
  • Use importance filters to focus on high-value memories

Cost Management:

  • Use gpt-4o-mini for curation (default) - balances quality and cost
  • Manual curation mode gives full control over when LLM calls occur
  • Monitor curation costs via CurationResult.cost_usd

Security:

  • Never store credentials or API keys in memory
  • Use memory for context, decisions, and preferences only
  • Pin critical business rules to prevent accidental pruning

#Common Patterns

Long-Running Agent with Shared Memory:

python
# Multiple agents share the same memory workspace
shared_memory = MemoryPlugin(workspace="support_team")
 
agent1 = Agent("Support Agent A", tools=[shared_memory])
agent2 = Agent("Support Agent B", tools=[shared_memory])
 
# Agent A stores customer context
await agent1.start()
await agent1.run("Customer prefers email communication over phone")
await agent1.stop()
 
# Agent B can recall that context later
await agent2.start()
result = await agent2.run("How does this customer prefer to be contacted?")
# Agent B finds the information stored by Agent A
await agent2.stop()

Research Assistant with Global Knowledge:

python
# Global scope for persistent knowledge across all projects
memory = MemoryPlugin(
    scope="global",
    workspace="research_knowledge",
    auto_curate="on_stop"
)
 
agent = Agent(
    name="Research Assistant",
    prompt="You are a research assistant. Build a knowledge base over time.",
    tools=[memory]
)
 
await agent.start()
 
# Store research findings
await agent.run("Remember: The Pythagorean theorem applies to right triangles")
await agent.run("Remember: Python uses 0-based indexing for lists")
 
# Knowledge persists across projects and sessions
await agent.stop()

Workflow Integration:

python
from daita.core import Workflow
from daita.plugins import MemoryPlugin
 
# Shared memory across workflow agents
memory = MemoryPlugin(workspace="data_pipeline")
 
# Each agent in workflow uses shared memory
data_agent = Agent("Data Collector", tools=[memory])
analyst_agent = Agent("Data Analyst", tools=[memory])
reporter_agent = Agent("Report Generator", tools=[memory])
 
workflow = Workflow("Analytics Pipeline")
workflow.add_agent(data_agent)
workflow.add_agent(analyst_agent)
workflow.add_agent(reporter_agent)
 
# Agents share context through memory as workflow executes
await workflow.run()

#Error Handling

python
from daita.plugins import MemoryPlugin
 
try:
    memory = MemoryPlugin(
        workspace="my_workspace",
        curation_provider="openai"
    )
    agent = Agent("Assistant", tools=[memory])
    await agent.start()
 
    result = await agent.run("Remember important information")
 
except RuntimeError as e:
    if "Missing required environment variables" in str(e):
        print("Set DAITA_ORG_ID and DAITA_PROJECT_NAME for cloud memory")
    elif "not installed" in str(e):
        print("Install embedding provider: pip install openai")
    else:
        print(f"Memory error: {e}")
finally:
    await agent.stop()

#Troubleshooting

IssueSolution
openai not installedpip install openai (or anthropic, for embeddings)
Cloud memory initialization failsSet DAITA_ORG_ID and DAITA_PROJECT_NAME env vars
Empty recall resultsLower score_threshold or check if memories exist
High curation costsUse auto_curate="manual" to control when curation runs
Memories not persistingCheck workspace and scope configuration
Shared memory not workingEnsure same workspace parameter across agents
Curation not runningCheck auto_curate setting, verify LLM provider configured

#Next Steps