Searching & Querying
Search result format
What agents receive: Each result contains:
content
- The stored texttags
- Associated tagstimestamp
- When the memory was createdrelevance
- Scoring withrrf_score
,vector_rank
,fts_rank
Letta uses hybrid search combining semantic (vector) and keyword (full-text) search, ranked using Reciprocal Rank Fusion (RRF). Higher rrf_score
means more relevant.
Writing effective queries
Letta uses OpenAI’s text-embedding-3-small
model, which handles natural language questions well. Agents can use various query styles:
Natural language questions work best:
Keywords also work:
Concept-based queries leverage semantic understanding:
Pagination: Agents receive multiple results per search. If an agent doesn’t paginate correctly, you can instruct it to adjust the page
parameter or remind it to iterate through results.
Filtering by time
Agents can search by date ranges:
Agent datetime awareness:
- Agents know the current day but not the current time
- Agents can see timestamps of messages they’ve received
- Agents cannot control insertion timestamps (automatic)
- Developers can backdate memories via SDK with
created_at
- Time filtering enables queries like “what did we discuss last week?”
Tags and organization
Tags help agents organize and filter archival memories. Agents always know what tags exist in their archive since tag lists are compiled into the context window.
Common tag patterns:
user_info
,professional
,personal_history
documentation
,technical
,reference
conversation
,milestone
,event
company_policy
,procedure
,guideline
Tag search modes:
- Match any tag
- Match all tags
- Filter by date ranges
Example of organized tagging:
Performance and scale
Archival memory has no practical size limits and remains fast at scale:
Letta Cloud: Uses TurboPuffer for extremely fast semantic search, even with hundreds of thousands of memories.
Self-hosted: Uses pgvector (PostgreSQL) for vector search. Performance scales well with proper indexing.
Letta Desktop: Uses SQLite with vector search extensions. Suitable for personal use cases.
No matter the backend, archival memory scales to large archives without performance degradation.
Embedding models and search quality
Archival search quality depends on the agent’s embedding model:
Letta Cloud: All agents use text-embedding-3-small
, which is optimized for most use cases. This model cannot be changed.
Self-hosted: Embedding model is pinned to the agent at creation. The default text-embedding-3-small
is sufficient for nearly all use cases.
Changing embedding models (self-hosted only)
To change an agent’s embedding model, you must:
- List and export all archival memories
- Delete all archival memories
- Update the agent’s embedding model
- Re-insert all memories (they’ll be re-embedded)
Changing embedding models is a destructive operation. Export your archival memories first.
Programmatic access (SDK)
Developers can manage archival memory programmatically via the SDK: