Simple RAG with Letta
Manage retrieval on the client-side and inject context into your agent
In the Simple RAG approach, your application manages the retrieval process. You query your vector database, retrieve the relevant documents, and include them directly in the message you send to your Letta agent.
By the end of this tutorial, you’ll have a research assistant that uses your vector database to answer questions about scientific papers.
Prerequisites
To follow along, you need free accounts for:
- Letta - To access the agent development platform
- Hugging Face - For generating embeddings (MongoDB and Qdrant users only)
- One of the following vector databases:
- ChromaDB Cloud for a hosted vector database
- MongoDB Atlas for vector search with MongoDB
- Qdrant Cloud for a high-performance vector database
You will also need Python 3.8+ or Node.js v18+ and a code editor.
MongoDB and Qdrant users: This guide uses Hugging Face’s Inference API for generating embeddings. This approach keeps the tool code lightweight enough to run in Letta’s sandbox environment.
Getting Your API Keys
We’ll need API keys for Letta and your chosen vector database.
Get your Letta API Key
Get your ChromaDB Cloud credentials
Get your MongoDB Atlas credentials
Create a Free Cluster
Click Build a Cluster and select the free tier (M0). Choose your preferred cloud provider and region and click Create deployment.

Set Up Database Access
Next, set up connection security.
- Create a database user, then click Choose a connection method
- Choose Drivers to connect to your application, choose Python as the driver.
- Copy the entire connection string, including the query parameters at the end. It will look like this:
Make sure to replace <password> with your actual database user password. Keep all the query parameters (?retryWrites=true&w=majority&appName=Cluster0) they are required for proper connection configuration.

Configure Network Access (IP Whitelist)
By default, MongoDB Atlas blocks all outside connections. You must grant access to the services that need to connect.
- Navigate to Database and Network Access in the left sidebar.
- Click Add IP Address.
- For local development and testing, select Allow Access From Anywhere. This will add the IP address
0.0.0.0/0. - Click Confirm.

For a production environment, you would replace 0.0.0.0/0 with a secure list of static IP addresses provided by your hosting service (e.g., Letta).
Get your Qdrant Cloud credentials
Get your Hugging Face API Token (MongoDB & Qdrant users)
Create Access Token
Click the profile icon in the top right. Navigate to Settings > Access Tokens (or go directly to huggingface.co/settings/tokens).
The free tier includes 30,000 API requests per month, which is more than enough for development and testing.
Once you have these credentials, create a .env file in your project directory. Add the credentials for your chosen database:
ChromaDB
MongoDB Atlas
Qdrant
Step 1: Set Up the Vector Database
First, we need to populate your chosen vector database with the content of the research papers. We’ll use two papers for this demo: “Attention Is All You Need” and “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”.
Before we begin, let’s create a virtual environment to keep our dependencies isolated:
Python
Typescript
Before we begin, let’s create a Python virtual environment to keep our dependencies isolated:
Download the research papers using curl with the -L flag to follow redirects:
Verify the PDFs downloaded correctly:
You should see output indicating these are PDF documents, not HTML files.
Install the necessary packages for your chosen database:
ChromaDB
MongoDB Atlas
Qdrant
For Python, install with:
Now create a setup.py or setup.ts file to load the PDFs, split them into chunks, and ingest them into your database:
ChromaDB
MongoDB Atlas
Qdrant
Run the script from your terminal:
Python
Typescript
If you are using MongoDB Atlas, you must manually create a vector search index by following the steps below.
Create the Vector Search Index (MongoDB Atlas Only)
MongoDB Atlas users: The setup script ingests your data, but MongoDB Atlas requires you to manually create a vector search index before queries will work. Follow these steps carefully.
Select Database and Collection
- Database: Select
rag_demo(or whatever you set asMONGODB_DB_NAME) - Collection: Select
rag_collection
Your vector database is now populated with research paper content and ready to query.
Step 2: Create a Simple Letta Agent
For the Simple RAG approach, the Letta agent doesn’t need any special tools or complex instructions. Its only job is to answer a question based on the context we provide. We can create this agent programmatically using the Letta SDK.
Create a file named create_agent.py or create_agent.ts:
Python
Typescript
Run this script once to create the agent in your Letta project.

Step 3: Query, Format, and Ask
Now we’ll write the main script, simple_rag.py or simple_rag.ts, that ties everything together. This script will:
- Take a user’s question.
- Query your vector database to find the most relevant document chunks.
- Construct a detailed prompt that includes both the user’s question and the retrieved context.
- Send this combined prompt to our Simple Letta agent and print the response.
ChromaDB
MongoDB Atlas
Qdrant
Replace your-agent-id with the actual ID of the agent you created in the previous step.
When you run this script, your application performs the retrieval, and the Letta agent provides the answer based on the context it receives. This gives you full control over the data pipeline.
Next Steps
Now that you’ve integrated Simple RAG with Letta, you can explore more advanced integration patterns:





