Agentic RAG with Letta

In the Agentic RAG approach, we delegate the retrieval process to the agent itself. Instead of your application deciding what to search for, we provide the agent with a custom tool that allows it to query your vector database directly. This makes the agent more autonomous and your client-side code much simpler.

By the end of this tutorial, you’ll have a research assistant that autonomously decides when to search your vector database and what queries to use.

Prerequisites

To follow along, you need free accounts for:

Letta - To access the agent development platform
Hugging Face - For generating embeddings (MongoDB and Qdrant users only)
One of the following vector databases:
- ChromaDB Cloud for a hosted vector database
- MongoDB Atlas for vector search with MongoDB
- Qdrant Cloud for a high-performance vector database

You will also need Python 3.8+ or Node.js v18+ and a code editor.

MongoDB and Qdrant users: This guide uses Hugging Face’s Inference API for generating embeddings. This approach keeps the tool code lightweight enough to run in Letta’s sandbox environment.

Getting Your API Keys

We’ll need API keys for Letta and your chosen vector database.

Get your Letta API Key

Create a Letta Account

If you don’t have one, sign up for a free account at letta.com.

Navigate to API Keys

Once logged in, click on API keys in the sidebar.

Create and Copy Your Key

Click + Create API key, give it a descriptive name, and click Confirm. Copy the key and save it somewhere safe.

Get your Vector Database credentials

ChromaDB

MongoDB Atlas

Qdrant

Create a ChromaDB Cloud Account

Create a New Database

From your dashboard, create a new database. ChromaDB New Project

Get Your API Key and Host

In your project settings, you’ll find your API Key, Tenant, Database, and Host URL. We’ll need all of these for our scripts. ChromaDB Keys

Get your Hugging Face API Token (MongoDB & Qdrant users)

Create a Hugging Face Account

Create Access Token

Click the profile icon in the top right. Navigate to Settings > Access Tokens (or go directly to huggingface.co/settings/tokens).

Generate New Token

Click New token, give it a name (e.g., “Letta RAG Demo”), select Read role, and click Create token. Copy the token and save it securely. Hugging Face Token

The free tier includes 30,000 API requests per month, which is more than enough for development and testing.

Once you have these credentials, create a .env file in your project directory. Add the credentials for your chosen database:

Chromadb

MongoDB Atlas

Qdrant

$ LETTA_API_KEY="..."
> CHROMA_API_KEY="..."
> CHROMA_TENANT="..."
> CHROMA_DATABASE="..."

Step 1: Set Up the Vector Database

First, we need to populate your chosen vector database with the content of the research papers. We’ll use two papers for this demo: “Attention Is All You Need” and “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”.

Before we begin, let’s set up our development environment:

$ # Create a Python virtual environment to keep dependencies isolated
> python -m venv venv
> source venv/bin/activate  # On Windows, use: venv\Scripts\activate

Typescript users must update package.json to use ES modules:

1 "type": "module"

Download the research papers using curl with the -L flag to follow redirects:

curl -L -o 1706.03762.pdf https://arxiv.org/pdf/1706.03762.pdf
curl -L -o 1810.04805.pdf https://arxiv.org/pdf/1810.04805.pdf

Verify the PDFs downloaded correctly:

file 1706.03762.pdf 1810.04805.pdf

You should see output indicating these are PDF documents, not HTML files.

Install the necessary packages for your chosen database:

Chromadb

MongoDB Atlas

Qdrant

# requirements.txt
letta-client
chromadb
pypdf
python-dotenv

For Python, install with:

$ pip install -r requirements.txt

Now create a setup.py or setup.ts file to load the PDFs, split them into chunks, and ingest them into your database:

Chromadb

MongoDB Atlas

Qdrant

1 import os
2 import chromadb
3 import pypdf
4 from dotenv import load_dotenv
5 
6 load_dotenv()
7 
8 def main():
9     # Connect to ChromaDB Cloud
10     client = chromadb.CloudClient(
11         tenant=os.getenv("CHROMA_TENANT"),
12         database=os.getenv("CHROMA_DATABASE"),
13         api_key=os.getenv("CHROMA_API_KEY")
14     )
15 
16     # Create or get the collection
17     collection = client.get_or_create_collection("rag_collection")
18 
19     # Ingest PDFs
20     pdf_files = ["1706.03762.pdf", "1810.04805.pdf"]
21     for pdf_file in pdf_files:
22         print(f"Ingesting {pdf_file}...")
23         reader = pypdf.PdfReader(pdf_file)
24         for i, page in enumerate(reader.pages):
25             collection.add(
26                 ids=[f"{pdf_file}-{i}"],
27                 documents=[page.extract_text()]
28             )
29 
30     print("\nIngestion complete!")
31     print(f"Total documents in collection: {collection.count()}")
32 
33 if __name__ == "__main__":
34     main()

Run the script from your terminal:

$ python setup.py

If you are using MongoDB Atlas, you must manually create a vector search index by following the steps below.

Create the Vector Search Index (MongoDB Atlas Only)

MongoDB Atlas users: The setup script ingests your data, but MongoDB Atlas requires you to manually create a vector search index before queries will work. Follow these steps carefully.

Navigate to Atlas Search

Create Search Index

Click “Create Search Index”, then choose “JSON Editor” (not “Visual Editor”).

Select Database and Collection

Database: Select rag_demo (or whatever you set as MONGODB_DB_NAME)
Collection: Select rag_collection

Name and Configure Index

Index Name: Enter vector_index (this exact name is required by the code)
Paste this JSON definition:

1 {
2   "fields": [
3     {
4       "type": "vector",
5       "path": "embedding",
6       "numDimensions": 384,
7       "similarity": "cosine"
8     }
9   ]
10 }

Note: 384 dimensions is for Hugging Face’s BAAI/bge-small-en-v1.5 model.

Create and Wait

Click “Create Search Index”. The index will take a few minutes to build. Wait until the status shows as “Active” before proceeding.

Your vector database is now populated with research paper content and ready to query.

Step 2: Create a Custom Search Tool

A Letta tool is a Python function that your agent can call. We’ll create a function that searches your vector database and returns the results. Letta handles the complexities of exposing this function to the agent securely.

TypeScript users: Letta tools execute in Python, even when called from TypeScript. Create a tools.ts file that exports the Python code as a string constant, which you’ll use in Step 3 to create the tool.

Create a new file named tools.py (Python) or tools.ts (TypeScript) with the appropriate implementation for your database:

ChromaDB

MongoDB Atlas

Qdrant

1 def search_research_papers(query_text: str, n_results: int = 1) -> str:
2     """
3     Searches the research paper collection for a given query.
4 
5     Args:
6         query_text (str): The text to search for.
7         n_results (int): The number of results to return.
8 
9     Returns:
10         str: The most relevant document found.
11     """
12     import chromadb
13     import os
14 
15     # ChromaDB Cloud Client
16     # This tool code is executed on the Letta server. It expects the ChromaDB
17     # credentials to be passed as environment variables.
18     api_key = os.getenv("CHROMA_API_KEY")
19     tenant = os.getenv("CHROMA_TENANT")
20     database = os.getenv("CHROMA_DATABASE")
21 
22     if not all([api_key, tenant, database]):
23         raise ValueError("CHROMA_API_KEY, CHROMA_TENANT, and CHROMA_DATABASE must be set as environment variables.")
24 
25     client = chromadb.CloudClient(
26         tenant=tenant,
27         database=database,
28         api_key=api_key
29     )
30 
31     collection = client.get_or_create_collection("rag_collection")
32 
33     try:
34         results = collection.query(
35             query_texts=[query_text],
36             n_results=n_results
37         )
38 
39         document = results['documents'][0][0]
40         return document
41     except Exception as e:
42         return f"Tool failed with error: {e}"

This function takes a query, connects to your database, retrieves the most relevant documents, and returns them as a single string.

Step 3: Configure an Agentic Research Assistant

Next, we’ll create a new agent. This agent will have a specific persona that instructs it on how to behave and, most importantly, it will be equipped with our new search tool.

Create a file named create_agentic_agent.py (Python) or create_agentic_agent.ts (TypeScript):

1 import os
2 from letta_client import Letta
3 from dotenv import load_dotenv
4 from tools import search_research_papers
5 
6 load_dotenv()
7 
8 # Initialize the Letta client
9 client = Letta(token=os.getenv("LETTA_API_KEY"))
10 
11 # Create a tool from our Python function
12 search_tool = client.tools.create_from_function(func=search_research_papers)
13 
14 # Define the agent's persona
15 persona = """You are a world-class research assistant. Your goal is to answer questions accurately by searching through a database of research papers. When a user asks a question, first use the `search_research_papers` tool to find relevant information. Then, answer the user's question based on the information returned by the tool."""
16 
17 # Create the agent with the tool attached
18 agent = client.agents.create(
19     name="Agentic RAG Assistant",
20     description="A smart agent that can search a vector database to answer questions.",
21     memory_blocks=[
22         {
23             "label": "persona",
24             "value": persona
25         }
26     ],
27     tools=[search_tool.name]
28 )
29 
30 print(f"Agent '{agent.name}' created with ID: {agent.id}")

TypeScript users: Notice how the TypeScript version imports searchResearchPapersToolCode from tools.ts (the file you created in Step 2). This keeps the code organized, just like the Python version imports from tools.py.

Run this script once to create the agent in your Letta project:

$ python create_agentic_agent.py

Configure Tool Dependencies and Environment Variables

For the tool to work within Letta’s environment, we need to configure its dependencies and environment variables through the Letta dashboard.

Find your agent

Navigate to your Letta dashboard and find the “Agentic RAG Assistant” agent you just created.

Access the ADE

Click on your agent to open the Agent Development Environment (ADE).

Configure Dependencies

In the ADE, select Tools from the sidebar, find and click on the search_research_papers tool, then click on the Dependencies tab.

Add the following dependencies based on your database:

ChromaDB

MongoDB Atlas

Qdrant

chromadb

Letta Dependencies Configuration

Configure Environment Variables

In the same tool configuration, navigate to Simulator > Environment.

Add the following environment variables with their corresponding values from your .env file:

ChromaDB

MongoDB Atlas

Qdrant

CHROMA_API_KEY
CHROMA_TENANT
CHROMA_DATABASE

Make sure to click upload button next to the environment variable to update the agent with the variable.

Letta Tool Configuration

Now, when the agent calls this tool, Letta’s execution environment will know to install the necessary dependencies and will have access to the necessary credentials to connect to your database.

Step 4: Let the Agent Lead the Conversation

With the agentic setup, our client-side code becomes incredibly simple. We no longer need to worry about retrieving context, we just send the user’s raw question to the agent and let it handle the rest.

Create the agentic_rag.py or agentic_rag.ts script:

1 import os
2 from letta_client import Letta
3 from dotenv import load_dotenv
4 
5 load_dotenv()
6 
7 # Initialize client
8 letta_client = Letta(token=os.getenv("LETTA_API_KEY"))
9 
10 AGENT_ID = "your-agentic-agent-id"  # Replace with your new agent ID
11 
12 def main():
13     while True:
14         user_query = input("\nAsk a question about the research papers: ")
15         if user_query.lower() in ['exit', 'quit']:
16             break
17 
18         response = letta_client.agents.messages.create(
19             agent_id=AGENT_ID,
20             messages=[{"role": "user", "content": user_query}]
21         )
22 
23         for message in response.messages:
24             if message.message_type == 'assistant_message':
25                 print(f"\nAgent: {message.content}")
26 
27 if __name__ == "__main__":
28     main()

Replace your-agentic-agent-id with the ID of the new agent you just created.

When you run this script, the agent receives the question, understands from its persona that it needs to search for information, calls the search_research_papers tool, gets the context, and then formulates an answer. All the RAG logic is handled by the agent, not your application.

Next Steps

Now that you’ve integrated Agentic RAG with Letta, you can expand on this foundation:

Simple RAG

Learn how to manage retrieval on the client-side for complete control.

Custom Tools

Explore creating more advanced custom tools for your agents.