Agentic RAG with Letta

Empower your agent with custom search tools for autonomous retrieval

In the Agentic RAG approach, we delegate the retrieval process to the agent itself. Instead of your application deciding what to search for, we provide the agent with a custom tool that allows it to query your vector database directly. This makes the agent more autonomous and your client-side code much simpler.

By the end of this tutorial, you’ll have a research assistant that autonomously decides when to search your vector database and what queries to use.

Prerequisites

To follow along, you need free accounts for:

  • Letta - To access the agent development platform
  • Hugging Face - For generating embeddings (MongoDB and Qdrant users only)
  • One of the following vector databases:

You will also need Python 3.8+ or Node.js v18+ and a code editor.

MongoDB and Qdrant users: This guide uses Hugging Face’s Inference API for generating embeddings. This approach keeps the tool code lightweight enough to run in Letta’s sandbox environment.

Getting Your API Keys

We’ll need API keys for Letta and your chosen vector database.

1

Create a Letta Account

If you don’t have one, sign up for a free account at letta.com.

3

Create and Copy Your Key

Click + Create API key, give it a descriptive name, and click Confirm. Copy the key and save it somewhere safe.

1

Create a ChromaDB Cloud Account

Sign up for a free account on the ChromaDB Cloud website.

2

Create a New Database

From your dashboard, create a new database. ChromaDB New Project

3

Get Your API Key and Host

In your project settings, you’ll find your API Key, Tenant, Database, and Host URL. We’ll need all of these for our scripts. ChromaDB Keys

1

Create a Hugging Face Account

Sign up for a free account at huggingface.co.

2

Create Access Token

Click the profile icon in the top right. Navigate to Settings > Access Tokens (or go directly to huggingface.co/settings/tokens).

3

Generate New Token

Click New token, give it a name (e.g., “Letta RAG Demo”), select Read role, and click Create token. Copy the token and save it securely. Hugging Face Token

The free tier includes 30,000 API requests per month, which is more than enough for development and testing.

Once you have these credentials, create a .env file in your project directory. Add the credentials for your chosen database:

$LETTA_API_KEY="..."
>CHROMA_API_KEY="..."
>CHROMA_TENANT="..."
>CHROMA_DATABASE="..."

Step 1: Set Up the Vector Database

First, we need to populate your chosen vector database with the content of the research papers. We’ll use two papers for this demo: “Attention Is All You Need” and “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”.

Before we begin, let’s set up our development environment:

$# Create a Python virtual environment to keep dependencies isolated
>python -m venv venv
>source venv/bin/activate # On Windows, use: venv\Scripts\activate

Typescript users must update package.json to use ES modules:

1"type": "module"

Download the research papers using curl with the -L flag to follow redirects:

curl -L -o 1706.03762.pdf https://arxiv.org/pdf/1706.03762.pdf
curl -L -o 1810.04805.pdf https://arxiv.org/pdf/1810.04805.pdf

Verify the PDFs downloaded correctly:

file 1706.03762.pdf 1810.04805.pdf

You should see output indicating these are PDF documents, not HTML files.

Install the necessary packages for your chosen database:

# requirements.txt
letta-client
chromadb
pypdf
python-dotenv

For Python, install with:

$pip install -r requirements.txt

Now create a setup.py or setup.ts file to load the PDFs, split them into chunks, and ingest them into your database:

1import os
2import chromadb
3import pypdf
4from dotenv import load_dotenv
5
6load_dotenv()
7
8def main():
9 # Connect to ChromaDB Cloud
10 client = chromadb.CloudClient(
11 tenant=os.getenv("CHROMA_TENANT"),
12 database=os.getenv("CHROMA_DATABASE"),
13 api_key=os.getenv("CHROMA_API_KEY")
14 )
15
16 # Create or get the collection
17 collection = client.get_or_create_collection("rag_collection")
18
19 # Ingest PDFs
20 pdf_files = ["1706.03762.pdf", "1810.04805.pdf"]
21 for pdf_file in pdf_files:
22 print(f"Ingesting {pdf_file}...")
23 reader = pypdf.PdfReader(pdf_file)
24 for i, page in enumerate(reader.pages):
25 collection.add(
26 ids=[f"{pdf_file}-{i}"],
27 documents=[page.extract_text()]
28 )
29
30 print("\nIngestion complete!")
31 print(f"Total documents in collection: {collection.count()}")
32
33if __name__ == "__main__":
34 main()

Run the script from your terminal:

$python setup.py

If you are using MongoDB Atlas, you must manually create a vector search index by following the steps below.

MongoDB Atlas users: The setup script ingests your data, but MongoDB Atlas requires you to manually create a vector search index before queries will work. Follow these steps carefully.

2

Create Search Index

Click “Create Search Index”, then choose “JSON Editor” (not “Visual Editor”).

3

Select Database and Collection

  • Database: Select rag_demo (or whatever you set as MONGODB_DB_NAME)
  • Collection: Select rag_collection
4

Name and Configure Index

  • Index Name: Enter vector_index (this exact name is required by the code)
  • Paste this JSON definition:
1{
2 "fields": [
3 {
4 "type": "vector",
5 "path": "embedding",
6 "numDimensions": 384,
7 "similarity": "cosine"
8 }
9 ]
10}

Note: 384 dimensions is for Hugging Face’s BAAI/bge-small-en-v1.5 model.

5

Create and Wait

Click “Create Search Index”. The index will take a few minutes to build. Wait until the status shows as “Active” before proceeding.

Your vector database is now populated with research paper content and ready to query.

Step 2: Create a Custom Search Tool

A Letta tool is a Python function that your agent can call. We’ll create a function that searches your vector database and returns the results. Letta handles the complexities of exposing this function to the agent securely.

TypeScript users: Letta tools execute in Python, even when called from TypeScript. Create a tools.ts file that exports the Python code as a string constant, which you’ll use in Step 3 to create the tool.

Create a new file named tools.py (Python) or tools.ts (TypeScript) with the appropriate implementation for your database:

1def search_research_papers(query_text: str, n_results: int = 1) -> str:
2 """
3 Searches the research paper collection for a given query.
4
5 Args:
6 query_text (str): The text to search for.
7 n_results (int): The number of results to return.
8
9 Returns:
10 str: The most relevant document found.
11 """
12 import chromadb
13 import os
14
15 # ChromaDB Cloud Client
16 # This tool code is executed on the Letta server. It expects the ChromaDB
17 # credentials to be passed as environment variables.
18 api_key = os.getenv("CHROMA_API_KEY")
19 tenant = os.getenv("CHROMA_TENANT")
20 database = os.getenv("CHROMA_DATABASE")
21
22 if not all([api_key, tenant, database]):
23 raise ValueError("CHROMA_API_KEY, CHROMA_TENANT, and CHROMA_DATABASE must be set as environment variables.")
24
25 client = chromadb.CloudClient(
26 tenant=tenant,
27 database=database,
28 api_key=api_key
29 )
30
31 collection = client.get_or_create_collection("rag_collection")
32
33 try:
34 results = collection.query(
35 query_texts=[query_text],
36 n_results=n_results
37 )
38
39 document = results['documents'][0][0]
40 return document
41 except Exception as e:
42 return f"Tool failed with error: {e}"

This function takes a query, connects to your database, retrieves the most relevant documents, and returns them as a single string.

Step 3: Configure an Agentic Research Assistant

Next, we’ll create a new agent. This agent will have a specific persona that instructs it on how to behave and, most importantly, it will be equipped with our new search tool.

Create a file named create_agentic_agent.py (Python) or create_agentic_agent.ts (TypeScript):

1import os
2from letta_client import Letta
3from dotenv import load_dotenv
4from tools import search_research_papers
5
6load_dotenv()
7
8# Initialize the Letta client
9client = Letta(token=os.getenv("LETTA_API_KEY"))
10
11# Create a tool from our Python function
12search_tool = client.tools.create_from_function(func=search_research_papers)
13
14# Define the agent's persona
15persona = """You are a world-class research assistant. Your goal is to answer questions accurately by searching through a database of research papers. When a user asks a question, first use the `search_research_papers` tool to find relevant information. Then, answer the user's question based on the information returned by the tool."""
16
17# Create the agent with the tool attached
18agent = client.agents.create(
19 name="Agentic RAG Assistant",
20 description="A smart agent that can search a vector database to answer questions.",
21 memory_blocks=[
22 {
23 "label": "persona",
24 "value": persona
25 }
26 ],
27 tools=[search_tool.name]
28)
29
30print(f"Agent '{agent.name}' created with ID: {agent.id}")

TypeScript users: Notice how the TypeScript version imports searchResearchPapersToolCode from tools.ts (the file you created in Step 2). This keeps the code organized, just like the Python version imports from tools.py.

Run this script once to create the agent in your Letta project:

$python create_agentic_agent.py

Configure Tool Dependencies and Environment Variables

For the tool to work within Letta’s environment, we need to configure its dependencies and environment variables through the Letta dashboard.

1

Find your agent

Navigate to your Letta dashboard and find the “Agentic RAG Assistant” agent you just created.

2

Access the ADE

Click on your agent to open the Agent Development Environment (ADE).

3

Configure Dependencies

In the ADE, select Tools from the sidebar, find and click on the search_research_papers tool, then click on the Dependencies tab.

Add the following dependencies based on your database:

chromadb

Letta Dependencies Configuration

4

Configure Environment Variables

In the same tool configuration, navigate to Simulator > Environment.

Add the following environment variables with their corresponding values from your .env file:

CHROMA_API_KEY
CHROMA_TENANT
CHROMA_DATABASE

Make sure to click upload button next to the environment variable to update the agent with the variable.

Letta Tool Configuration

Now, when the agent calls this tool, Letta’s execution environment will know to install the necessary dependencies and will have access to the necessary credentials to connect to your database.

Step 4: Let the Agent Lead the Conversation

With the agentic setup, our client-side code becomes incredibly simple. We no longer need to worry about retrieving context, we just send the user’s raw question to the agent and let it handle the rest.

Create the agentic_rag.py or agentic_rag.ts script:

1import os
2from letta_client import Letta
3from dotenv import load_dotenv
4
5load_dotenv()
6
7# Initialize client
8letta_client = Letta(token=os.getenv("LETTA_API_KEY"))
9
10AGENT_ID = "your-agentic-agent-id" # Replace with your new agent ID
11
12def main():
13 while True:
14 user_query = input("\nAsk a question about the research papers: ")
15 if user_query.lower() in ['exit', 'quit']:
16 break
17
18 response = letta_client.agents.messages.create(
19 agent_id=AGENT_ID,
20 messages=[{"role": "user", "content": user_query}]
21 )
22
23 for message in response.messages:
24 if message.message_type == 'assistant_message':
25 print(f"\nAgent: {message.content}")
26
27if __name__ == "__main__":
28 main()

Replace your-agentic-agent-id with the ID of the new agent you just created.

When you run this script, the agent receives the question, understands from its persona that it needs to search for information, calls the search_research_papers tool, gets the context, and then formulates an answer. All the RAG logic is handled by the agent, not your application.

Next Steps

Now that you’ve integrated Agentic RAG with Letta, you can expand on this foundation: