Searching & Querying

How to search archival memory effectively

Search result format

What agents receive: Each result contains:

  • content - The stored text
  • tags - Associated tags
  • timestamp - When the memory was created
  • relevance - Scoring with rrf_score, vector_rank, fts_rank

Letta uses hybrid search combining semantic (vector) and keyword (full-text) search, ranked using Reciprocal Rank Fusion (RRF). Higher rrf_score means more relevant.

Writing effective queries

Letta uses OpenAI’s text-embedding-3-small model, which handles natural language questions well. Agents can use various query styles:

Natural language questions work best:

1# What the agent does (agent tool call)
2archival_memory_search(query="How does the test work?")
3# Returns: "The Voight-Kampff test measures involuntary emotional responses..."

Keywords also work:

1# What the agent does (agent tool call)
2archival_memory_search(query="replicant lifespan")
3# Returns memories containing both keywords and semantically related concepts

Concept-based queries leverage semantic understanding:

1# What the agent does (agent tool call)
2archival_memory_search(query="artificial memories")
3# Returns: "...experimental replicant with implanted memories..."
4# (semantic match despite different terminology)

Pagination: Agents receive multiple results per search. If an agent doesn’t paginate correctly, you can instruct it to adjust the page parameter or remind it to iterate through results.

Filtering by time

Agents can search by date ranges:

1# What the agent does (agent tool call)
2
3# Recent memories
4archival_memory_search(
5 query="test results",
6 start_datetime="2025-09-29T00:00:00"
7)
8
9# Specific time window
10archival_memory_search(
11 query="replicant cases",
12 start_datetime="2025-09-29T00:00:00",
13 end_datetime="2025-09-30T23:59:59"
14)

Agent datetime awareness:

  • Agents know the current day but not the current time
  • Agents can see timestamps of messages they’ve received
  • Agents cannot control insertion timestamps (automatic)
  • Developers can backdate memories via SDK with created_at
  • Time filtering enables queries like “what did we discuss last week?”

Tags and organization

Tags help agents organize and filter archival memories. Agents always know what tags exist in their archive since tag lists are compiled into the context window.

Common tag patterns:

  • user_info, professional, personal_history
  • documentation, technical, reference
  • conversation, milestone, event
  • company_policy, procedure, guideline

Tag search modes:

  • Match any tag
  • Match all tags
  • Filter by date ranges

Example of organized tagging:

1# What the agent does (agent tool call)
2
3# Atomic memory with precise tags
4archival_memory_insert(
5 content="Nexus-6 replicants have a four-year lifespan",
6 tags=["technical", "replicant", "nexus-6"]
7)
8
9# Later, easy retrieval
10archival_memory_search(
11 query="how long do replicants live",
12 tags=["technical"]
13)

Performance and scale

Archival memory has no practical size limits and remains fast at scale:

Letta Cloud: Uses TurboPuffer for extremely fast semantic search, even with hundreds of thousands of memories.

Self-hosted: Uses pgvector (PostgreSQL) for vector search. Performance scales well with proper indexing.

Letta Desktop: Uses SQLite with vector search extensions. Suitable for personal use cases.

No matter the backend, archival memory scales to large archives without performance degradation.

Embedding models and search quality

Archival search quality depends on the agent’s embedding model:

Letta Cloud: All agents use text-embedding-3-small, which is optimized for most use cases. This model cannot be changed.

Self-hosted: Embedding model is pinned to the agent at creation. The default text-embedding-3-small is sufficient for nearly all use cases.

Changing embedding models (self-hosted only)

To change an agent’s embedding model, you must:

  1. List and export all archival memories
  2. Delete all archival memories
  3. Update the agent’s embedding model
  4. Re-insert all memories (they’ll be re-embedded)

Changing embedding models is a destructive operation. Export your archival memories first.

Programmatic access (SDK)

Developers can manage archival memory programmatically via the SDK:

1// Insert a memory
2await client.agents.passages.insert(agent.id, {
3 content: "The Voight-Kampff test requires a minimum of 20 cross-referenced questions",
4 tags: ["technical", "testing", "protocol"]
5});
6
7// Search memories
8const results = await client.agents.passages.search(agent.id, {
9 query: "testing procedures",
10 tags: ["protocol"],
11 page: 0
12});
13
14// List all memories
15const passages = await client.agents.passages.list(agent.id, {
16 limit: 100
17});
18
19// Get a specific memory
20const passage = await client.agents.passages.get(agent.id, passageId);

Next steps