Simple RAG with Letta

In the Simple RAG approach, your application manages the retrieval process. You query your vector database, retrieve the relevant documents, and include them directly in the message you send to your Letta agent.

By the end of this tutorial, you’ll have a research assistant that uses your vector database to answer questions about scientific papers.

Prerequisites

To follow along, you need free accounts for:

Letta - To access the agent development platform
Hugging Face - For generating embeddings (MongoDB and Qdrant users only)
One of the following vector databases:
- ChromaDB Cloud for a hosted vector database
- MongoDB Atlas for vector search with MongoDB
- Qdrant Cloud for a high-performance vector database

You will also need Python 3.8+ or Node.js v18+ and a code editor.

MongoDB and Qdrant users: This guide uses Hugging Face’s Inference API for generating embeddings. This approach keeps the tool code lightweight enough to run in Letta’s sandbox environment.

Getting Your API Keys

We’ll need API keys for Letta and your chosen vector database.

Get your Letta API Key

Create a Letta Account

If you don’t have one, sign up for a free account at letta.com.

Navigate to API Keys

Once logged in, click on API keys in the sidebar.

Create and Copy Your Key

Click + Create API key, give it a descriptive name, and click Confirm. Copy the key and save it somewhere safe.

Get your ChromaDB Cloud credentials

Create a ChromaDB Cloud Account

Create a New Database

From your dashboard, create a new database. ChromaDB New Project

Get Your API Key and Host

In your project settings, you’ll find your API Key, Tenant, Database, and Host URL. We’ll need all of these for our scripts. ChromaDB Keys

Get your MongoDB Atlas credentials

Create a MongoDB Atlas Account

Create a Free Cluster

Click Build a Cluster and select the free tier (M0). Choose your preferred cloud provider and region and click Create deployment. Create MongoDB Cluster

Set Up Database Access

Next, set up connection security.

Create a database user, then click Choose a connection method
Choose Drivers to connect to your application, choose Python as the driver.
Copy the entire connection string, including the query parameters at the end. It will look like this:

mongodb+srv://<username>:<password>@cluster0.xxxxx.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0

Make sure to replace <password> with your actual database user password. Keep all the query parameters (?retryWrites=true&w=majority&appName=Cluster0) they are required for proper connection configuration.

MongoDB Connection String

Configure Network Access (IP Whitelist)

By default, MongoDB Atlas blocks all outside connections. You must grant access to the services that need to connect.

Navigate to Database and Network Access in the left sidebar.
Click Add IP Address.
For local development and testing, select Allow Access From Anywhere. This will add the IP address 0.0.0.0/0.
Click Confirm.

MongoDB IP Configuration

For a production environment, you would replace 0.0.0.0/0 with a secure list of static IP addresses provided by your hosting service (e.g., Letta).

Get your Qdrant Cloud credentials

Create a Qdrant Cloud Account

Create a New Cluster

From your dashboard, click Clusters and then + Create. Select the free tier and choose your preferred region.

Create Qdrant Cluster

Get Your API Key and URL

Once your cluster is created, click on it to view details.

Copy the following:

API Key
Cluster URL

Qdrant Connection Details

Get your Hugging Face API Token (MongoDB & Qdrant users)

Create a Hugging Face Account

Create Access Token

Click the profile icon in the top right. Navigate to Settings > Access Tokens (or go directly to huggingface.co/settings/tokens).

Generate New Token

Click New token, give it a name (e.g., “Letta RAG Demo”), select Read role, and click Create token. Copy the token and save it securely. Hugging Face Token

The free tier includes 30,000 API requests per month, which is more than enough for development and testing.

Once you have these credentials, create a .env file in your project directory. Add the credentials for your chosen database:

ChromaDB

MongoDB Atlas

Qdrant

$ LETTA_API_KEY="..."
> CHROMA_API_KEY="..."
> CHROMA_TENANT="..."
> CHROMA_DATABASE="..."

Step 1: Set Up the Vector Database

First, we need to populate your chosen vector database with the content of the research papers. We’ll use two papers for this demo: “Attention Is All You Need” and “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”.

Before we begin, let’s create a virtual environment to keep our dependencies isolated:

Python

Typescript

Before we begin, let’s create a Python virtual environment to keep our dependencies isolated:

$ python -m venv venv
> source venv/bin/activate  # On Windows, use: venv\Scripts\activate

Download the research papers using curl with the -L flag to follow redirects:

curl -L -o 1706.03762.pdf https://arxiv.org/pdf/1706.03762.pdf
curl -L -o 1810.04805.pdf https://arxiv.org/pdf/1810.04805.pdf

Verify the PDFs downloaded correctly:

file 1706.03762.pdf 1810.04805.pdf

You should see output indicating these are PDF documents, not HTML files.

Install the necessary packages for your chosen database:

ChromaDB

MongoDB Atlas

Qdrant

# requirements.txt
letta-client
chromadb
pypdf
python-dotenv

For Python, install with:

$ pip install -r requirements.txt

Now create a setup.py or setup.ts file to load the PDFs, split them into chunks, and ingest them into your database:

ChromaDB

MongoDB Atlas

Qdrant

1 import os
2 import chromadb
3 import pypdf
4 from dotenv import load_dotenv
5 
6 load_dotenv()
7 
8 def main():
9     # Connect to ChromaDB Cloud
10     client = chromadb.CloudClient(
11         tenant=os.getenv("CHROMA_TENANT"),
12         database=os.getenv("CHROMA_DATABASE"),
13         api_key=os.getenv("CHROMA_API_KEY")
14     )
15 
16     # Create or get the collection
17     collection = client.get_or_create_collection("rag_collection")
18 
19     # Ingest PDFs
20     pdf_files = ["1706.03762.pdf", "1810.04805.pdf"]
21     for pdf_file in pdf_files:
22         print(f"Ingesting {pdf_file}...")
23         reader = pypdf.PdfReader(pdf_file)
24         for i, page in enumerate(reader.pages):
25             text = page.extract_text()
26             if text:
27                 collection.add(
28                     ids=[f"{pdf_file}-{i}"],
29                     documents=[text]
30                 )
31 
32     print("\nIngestion complete!")
33     print(f"Total documents in collection: {collection.count()}")
34 
35 if __name__ == "__main__":
36     main()

Run the script from your terminal:

Python

Typescript

$ python setup.py

If you are using MongoDB Atlas, you must manually create a vector search index by following the steps below.

Create the Vector Search Index (MongoDB Atlas Only)

MongoDB Atlas users: The setup script ingests your data, but MongoDB Atlas requires you to manually create a vector search index before queries will work. Follow these steps carefully.

Navigate to Atlas Search

Create Search Index

Click “Create Search Index”, choose Vector Search.

Select Database and Collection

Database: Select rag_demo (or whatever you set as MONGODB_DB_NAME)
Collection: Select rag_collection

Name and Configure Index

Index Name: Enter vector_index (this exact name is required by the code)
Choose “JSON Editor” (not “Visual Editor”). Click Next
Paste this JSON definition:

1 {
2   "fields": [
3     {
4       "type": "vector",
5       "path": "embedding",
6       "numDimensions": 384,
7       "similarity": "cosine"
8     }
9   ]
10 }

Note: 384 dimensions is for Hugging Face’s BAAI/bge-small-en-v1.5 model.

Create and Wait

Click Next, then click “Create Search Index”. The index will take a few minutes to build. Wait until the status shows as “Active” before proceeding.

Your vector database is now populated with research paper content and ready to query.

Step 2: Create a Simple Letta Agent

For the Simple RAG approach, the Letta agent doesn’t need any special tools or complex instructions. Its only job is to answer a question based on the context we provide. We can create this agent programmatically using the Letta SDK.

Create a file named create_agent.py or create_agent.ts:

Python

Typescript

1 import os
2 from letta_client import Letta
3 from dotenv import load_dotenv
4 
5 load_dotenv()
6 
7 # Initialize the Letta client
8 client = Letta(token=os.getenv("LETTA_API_KEY"))
9 
10 # Create the agent
11 agent = client.agents.create(
12     name="Simple RAG Agent",
13     description="This agent answers questions based on provided context. It has no tools or special memory.",
14     memory_blocks=[
15         {
16             "label": "persona",
17             "value": "You are a helpful research assistant. Answer the user's question based *only* on the context provided."
18         }
19     ]
20 )
21 
22 print(f"Agent '{agent.name}' created with ID: {agent.id}")

Run this script once to create the agent in your Letta project.

Python

$ python create_agent.py```
> 
> ```bash title="TypeScript"
> npx tsx create_agent.ts

Stateless Agent in Letta UI

Step 3: Query, Format, and Ask

Now we’ll write the main script, simple_rag.py or simple_rag.ts, that ties everything together. This script will:

Take a user’s question.
Query your vector database to find the most relevant document chunks.
Construct a detailed prompt that includes both the user’s question and the retrieved context.
Send this combined prompt to our Simple Letta agent and print the response.

ChromaDB

MongoDB Atlas

Qdrant

1 import os
2 import chromadb
3 from letta_client import Letta
4 from dotenv import load_dotenv
5 
6 load_dotenv()
7 
8 # Initialize clients
9 letta_client = Letta(token=os.getenv("LETTA_API_KEY"))
10 chroma_client = chromadb.CloudClient(
11     tenant=os.getenv("CHROMA_TENANT"),
12     database=os.getenv("CHROMA_DATABASE"),
13     api_key=os.getenv("CHROMA_API_KEY")
14 )
15 
16 AGENT_ID = "your-agent-id"  # Replace with your agent ID
17 
18 def main():
19     while True:
20         question = input("\nAsk a question about the research papers: ")
21         if question.lower() in ['exit', 'quit']:
22             break
23 
24         # 1. Query ChromaDB
25         collection = chroma_client.get_collection("rag_collection")
26         results = collection.query(query_texts=[question], n_results=3)
27         context = "\n".join(results["documents"][0])
28 
29         # 2. Construct the prompt
30         prompt = f'''Context from research paper:
31 {context}
32 
33 Question: {question}
34 
35 Answer:'''
36 
37         # 3. Send to Letta Agent
38         response = letta_client.agents.messages.create(
39             agent_id=AGENT_ID,
40             messages=[{"role": "user", "content": prompt}]
41         )
42 
43         for message in response.messages:
44             if message.message_type == 'assistant_message':
45                 print(f"\nAgent: {message.content}")
46 
47 if __name__ == "__main__":
48     main()

Replace your-agent-id with the actual ID of the agent you created in the previous step.

When you run this script, your application performs the retrieval, and the Letta agent provides the answer based on the context it receives. This gives you full control over the data pipeline.

Next Steps

Now that you’ve integrated Simple RAG with Letta, you can explore more advanced integration patterns:

Agentic RAG

Learn how to empower your agent with custom search tools for autonomous retrieval.

Custom Tools

Explore creating more advanced custom tools for your agents.