Skip to content
  • Auto
  • Light
  • Dark
DiscordForumGitHubSign up
View as Markdown
Copy Markdown

Open in Claude
Open in ChatGPT

Simple RAG with Letta

In the simple RAG approach, your application manages the retrieval process. You query your vector database, retrieve the relevant documents, and include them directly in the message you send to your Letta agent.

By the end of this tutorial, you’ll have a research assistant that uses your vector database to answer questions about scientific papers.

To follow along, you need free accounts for:

  • Letta: To access the agent development platform
  • Hugging Face: To generate embeddings (MongoDB and Qdrant users only)
  • One of the following vector database platforms:

You also need a code editor and either Python (version 3.8 or later) or Node.js (version 18 or later).

You’ll need API keys for Letta and your chosen vector database.

Get your Letta API key
  1. Create a Letta account

    If you don’t have one, sign up for a free account at letta.com.

  2. Navigate to API keys

    Once logged in, click API keys in the sidebar.

  3. Create and copy your key

    Click + Create API key, give it a descriptive name, and click Confirm. Copy the key and save it somewhere safe. Letta API key navigation

Get your ChromaDB Cloud credentials
  1. Create a ChromaDB Cloud account

    Sign up for a free account on the ChromaDB Cloud website.

  2. Create a new database

    From your dashboard, create a new database. ChromaDB new project

  3. Get your API key and host

    Click the Configure Chroma SDK card and generate an API key by clicking the Create API key and copy code button in your preferred language tab. For this example, you need the API Key, Tenant, Database, and Host URL. ChromaDB keys

Get your MongoDB Atlas credentials
  1. Create a MongoDB Atlas account

    Sign up for a free account at mongodb.com/cloud/atlas/register.

  2. Create a free cluster

    Click Build a Cluster and select the free tier. Choose your preferred cloud provider and region, and click Create deployment. Create MongoDB cluster

  3. Set up database access

    Next, set up connection security:

    • On the security quickstart, create a database user, and then click Finish and Close.
    • On the cluster creation card, click Connect.
    • Choose Drivers to connect to your application, and select Python as the driver.
    • Copy the entire connection string, including the query parameters at the end. It will look like this:
    mongodb+srv://<username>:<password>@cluster0.xxxxx.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0

    MongoDB connection string

  4. Configure network access (IP whitelist)

    By default, MongoDB Atlas blocks all outside connections. You must grant access to the services that need to connect.

    • Navigate to Database and Network Access in the left sidebar.
    • In the IP Access List tab click, + Add IP Address.
    • For local development and testing, select Allow Access From Anywhere. This adds the IP address 0.0.0.0/0.
    • Click Confirm.

    MongoDB IP configuration

Get your Qdrant Cloud credentials
  1. Create a Qdrant Cloud account

    Sign up for a free account at cloud.qdrant.io.

  2. Create a new cluster

    From your dashboard, click Clusters and then + Create. Select the free tier and choose your preferred region.

    Create Qdrant cluster

  3. Get your API key and URL

    Once your cluster has been created, click on it to view the details.

    Copy the following values:

    • API Key
    • Cluster URL

    Qdrant connection details

Get your Hugging Face API token (MongoDB and Qdrant users)
  1. Create a Hugging Face account

    Sign up for a free account at huggingface.co.

  2. Create an access token

    Click the profile icon in the top right. Navigate to Settings > Access Tokens (or go directly to huggingface.co/settings/tokens).

  3. Generate a new token

    Click New token, give it a name (such as Letta RAG Demo), select the Read role, and click Create token. Copy the token and save it securely. Hugging Face token

Once you have these credentials, create a .env file in your project directory. Add the credentials for your chosen database:

Terminal window
LETTA_API_KEY="..."
CHROMA_API_KEY="..."
CHROMA_TENANT="..."
CHROMA_DATABASE="..."

First, you need to populate your chosen vector database with the content of the research papers. We’ll use two papers for this demo:

Before we begin, let’s create a Python virtual environment to keep our dependencies isolated:

Terminal window
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate

Download the research papers using curl with the -L flag to follow redirects:

curl -L -o 1706.03762.pdf https://arxiv.org/pdf/1706.03762.pdf
curl -L -o 1810.04805.pdf https://arxiv.org/pdf/1810.04805.pdf

Verify that the PDFs downloaded correctly:

file 1706.03762.pdf 1810.04805.pdf

You should see output indicating these are PDF documents, not HTML files.

Install the necessary packages for your chosen database:

# add dependencies to a requirements.txt file
letta-client
chromadb
pypdf
python-dotenv

For Python, install the packages with the following command:

Terminal window
pip install -r requirements.txt

Now create a setup.py or setup.ts file to load the PDFs, split them into chunks, and ingest them into your database:

import os
import chromadb
import pypdf
from dotenv import load_dotenv
load_dotenv()
def main():
# Connect to ChromaDB Cloud
client = chromadb.CloudClient(
tenant=os.getenv("CHROMA_TENANT"),
database=os.getenv("CHROMA_DATABASE"),
api_key=os.getenv("CHROMA_API_KEY")
)
# Create or get the collection
collection = client.get_or_create_collection("rag_collection")
# Ingest PDFs
pdf_files = ["1706.03762.pdf", "1810.04805.pdf"]
for pdf_file in pdf_files:
print(f"Ingesting {pdf_file}...")
reader = pypdf.PdfReader(pdf_file)
for i, page in enumerate(reader.pages):
text = page.extract_text()
if text:
collection.add(
ids=[f"{pdf_file}-{i}"],
documents=[text]
)
print("\nIngestion complete!")
print(f"Total documents in collection: {collection.count()}")
if __name__ == "__main__":
main()

Run the script from your terminal:

Terminal window
python setup.py

If you’re using MongoDB Atlas, manually create a vector search index by following the steps below.

Create the Vector Search Index (MongoDB Atlas only)
  1. Navigate to Atlas Search

    Log in to your MongoDB Atlas dashboard, and click on Search & Vector Search in the sidebar.

  2. Create a search index

    Click + Create Search Index, and choose Vector Search.

  3. Select the database and collection

    • Index Name: Enter vector_index (this exact name is required by the code).
    • Database: Select rag_demo (or whatever you set as MONGODB_DB_NAME).
    • Collection: Select rag_collection.
  4. Name and configure the index

    • Choose JSON Editor (not Visual Editor) in the configuration method section. Click Next.
    • Copy and paste this JSON definition as the configuration:
    {
    "fields": [
    {
    "type": "vector",
    "path": "embedding",
    "numDimensions": 384,
    "similarity": "cosine"
    }
    ]
    }

    Note: The 384 dimensions are for Hugging Face’s BAAI/bge-small-en-v1.5 model.

  5. Create and wait

    Click Next, then click Create Vector Search Index. The index takes a few minutes to build. Wait until it displays an Active status before proceeding.

Your vector database is now populated with research paper content and ready to query.

For the simple RAG approach, the Letta agent doesn’t need any special tools or complex instructions. Its only job is to answer a question based on the context we provide. We can create this agent programmatically using the Letta SDK.

Create a file named create_agent.py or create_agent.ts:

import os
from letta_client import Letta
from dotenv import load_dotenv
load_dotenv()
# Initialize the Letta client
client = Letta(api_key=os.getenv("LETTA_API_KEY"))
# Create the agent
agent = client.agents.create(
name="Simple RAG Agent",
model="openai/gpt-4o-mini",
embedding="openai/text-embedding-3-small",
description="This agent answers questions based on provided context. It has no tools or special memory.",
memory_blocks=[
{
"label": "persona",
"value": "You are a helpful research assistant. Answer the user's question based *only* on the context provided."
}
]
)
print(f"Agent '{agent.name}' created with ID: {agent.id}")

Run this script once to create the agent in your Letta project:

Terminal window
python create_agent.py

Simple agent in Letta UI

Now we’ll write the main script, simple_rag.py or simple_rag.ts, that ties everything together. This script will:

  • Take a user’s question
  • Query your vector database to find the most relevant document chunks
  • Construct a detailed prompt that includes both the user’s question and the retrieved context
  • Send this combined prompt to your simple Letta agent and print the response
import os
import chromadb
from letta_client import Letta
from dotenv import load_dotenv
load_dotenv()
# Initialize clients
letta_client = Letta(api_key=os.getenv("LETTA_API_KEY"))
chroma_client = chromadb.CloudClient(
tenant=os.getenv("CHROMA_TENANT"),
database=os.getenv("CHROMA_DATABASE"),
api_key=os.getenv("CHROMA_API_KEY")
)
AGENT_ID = "your-agent-id" # Replace with your agent ID
def main():
while True:
question = input("\nAsk a question about the research papers: ")
if question.lower() in ['exit', 'quit']:
break
# 1. Query ChromaDB
collection = chroma_client.get_collection("rag_collection")
results = collection.query(query_texts=[question], n_results=3)
context = "\n".join(results["documents"][0])
# 2. Construct the prompt
prompt = f'''Context from research paper:
{context}
Question: {question}
Answer:'''
# 3. Send to Letta Agent
response = letta_client.agents.messages.create(
agent_id=AGENT_ID,
messages=[{"role": "user", "content": prompt}]
)
for message in response.messages:
if message.message_type == 'assistant_message':
print(f"\nAgent: {message.content}")
if __name__ == "__main__":
main()

When you run this script, your application performs the retrieval, and the Letta agent provides the answer based on the context it receives. This gives you full control over the data pipeline.

Now that you’ve integrated simple RAG with Letta, you can explore more advanced integration patterns:

Agentic RAG

Learn how to empower your agent with custom search tools for autonomous retrieval.

Custom tools

Explore creating more advanced custom tools for your agents.