RAG with Letta

Connect your custom RAG pipeline to Letta agents

If you have an existing Retrieval-Augmented Generation (RAG) pipeline, you can connect it to your Letta agents. While Letta provides built-in features like archival memory, you can integrate your own RAG pipeline just as you would with any LLM API. This gives you full control over your data and retrieval methods.

What is RAG?

Retrieval-Augmented Generation (RAG) enhances LLM responses by retrieving relevant information from external data sources before generating an answer. Instead of relying on the model’s training data, a RAG system:

  1. Takes a user query.
  2. Searches a vector database for relevant documents.
  3. Includes those documents in the LLM’s context.
  4. Generates an informed response based on the retrieved information.

Choosing Your RAG Approach

Letta supports two approaches for integrating RAG, depending on how much control you want over the retrieval process.

AspectSimple RAGAgentic RAG
Who Controls RetrievalYour application controls when retrieval happens and what the retrieval query is.The agent decides when to retrieve and what query to use.
Context InclusionYou can always include retrieval results in the context.Retrieval happens only when the agent determines it’s needed.
LatencyLower – typically single-hop, as the agent doesn’t need to do a tool call.Higher – requires tool calls for retrieval.
Client CodeMore complex, as it handles retrieval logic.Simpler, as it just sends the user query.
CustomizationYou have full control via your retrieval function.You have full control via your custom tool definition.

Both approaches work with any vector database. Our tutorials include examples for ChromaDB, MongoDB Atlas, and Qdrant.

Next Steps

Ready to integrate RAG with your Letta agents?

Additional Resources