Agentic RAG with Letta

RAG

In the agentic RAG approach, we delegate the retrieval process to the agent itself. Instead of your application deciding what to search for, we provide the agent with a custom tool that allows it to query your vector database directly. This makes the agent more autonomous and your client-side code much simpler.

By the end of this tutorial, you’ll have a research assistant that autonomously decides when to search your vector database and what queries to use.

Prerequisites

To follow along, you need free accounts for:

Letta: To access the agent development platform
Hugging Face: To generate embeddings (MongoDB and Qdrant users only)
One of the following vector database platforms:
- ChromaDB Cloud: For a hosted vector database
- MongoDB Atlas: For vector search with MongoDB
- Qdrant Cloud: For a high-performance vector database

You also need a code editor and either Python (version 3.8 or later) or Node.js (version 18 or later).

Getting your API keys

You need API keys for Letta and your chosen vector database.

Get your Letta API key

Create a Letta account

If you don’t have one, sign up for a free account at letta.com.
Navigate to API keys

Once logged in, click API keys in the sidebar.
Create and copy your key

Click + Create API key, give it a descriptive name, and click Confirm. Copy the key and save it somewhere safe.

Get your vector database credentials

Create a ChromaDB Cloud account

Sign up for a free account on the ChromaDB Cloud website.
Create a new database

From your dashboard, create a new database.
Get your API key and host

In your project settings, find your API Key, Tenant, Database, and Host URL. You’ll need all four values for your scripts.

Create a MongoDB Atlas account

Sign up for a free account at mongodb.com/cloud/atlas/register.
Create a free cluster

Click Build a Cluster and select the free tier (M0). Choose your preferred cloud provider and region and click Create deployment.
Set up database access

Next, set up connection security:
- Create a database user, then click Choose a connection method.
- Choose Drivers to connect to your application, and select Python as the driver.
- Copy the entire connection string, including the query parameters at the end. It will look like this:
```
mongodb+srv://<username>:<password>@cluster0.xxxxx.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0
```
Make sure to replace <password> with your actual database user password. Keep all the query parameters (?retryWrites=true&w=majority&appName=Cluster0). You require them for a proper connection configuration.
Configure network access (IP whitelist)

By default, MongoDB Atlas blocks all outside connections. You must grant access to the services that need to connect.
- Navigate to Database and Network Access in the left sidebar.
- Click Add IP Address.
- For local development and testing, select Allow Access From Anywhere. This adds the IP address 0.0.0.0/0.
- Click Confirm.
For a production environment, you would replace 0.0.0.0/0 with a secure list of static IP addresses provided by your hosting service (for example, Letta).

Get your Hugging Face API token (MongoDB and Qdrant users)

Create a Hugging Face account

Sign up for a free account at huggingface.co.
Create an access token

Click the profile icon in the top right. Navigate to Settings > Access Tokens (or go directly to huggingface.co/settings/tokens).
Generate a new token

Click New token, give it a name (such as Letta RAG Demo), select the Read role, and click Create token. Copy the token and save it securely.

Once you have these credentials, create a .env file in your project directory. Add the credentials for your chosen database:

LETTA_API_KEY="..."
CHROMA_API_KEY="..."
CHROMA_TENANT="..."
CHROMA_DATABASE="..."

LETTA_API_KEY="..."
MONGODB_URI="mongodb+srv://username:[email protected]/?retryWrites=true&w=majority&appName=Cluster0"
MONGODB_DB_NAME="rag_demo"
HF_API_KEY="..."

LETTA_API_KEY="..."
QDRANT_URL="https://xxxxx.cloud.qdrant.io"
QDRANT_API_KEY="..."
HF_API_KEY="..."

Step 1: Set up the vector database

First, you need to populate your chosen vector database with the content of the research papers. We’ll use two papers for this demo:

Before we begin, let’s set up our development environment:

Python
TypeScript

# Create a Python virtual environment to keep dependencies isolated
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate

# Create a new Node.js project
npm init -y

# Create tsconfig.json for TypeScript configuration
cat > tsconfig.json << 'EOF'
{
  "compilerOptions": {
    "target": "ES2020",
    "module": "ESNext",
    "moduleResolution": "node",
    "esModuleInterop": true,
    "skipLibCheck": true,
    "strict": true
  }
}
EOF

Typescript users must update package.json to use ES modules:

"type": "module"

Download the research papers using curl with the -L flag to follow redirects:

curl -L -o 1706.03762.pdf https://arxiv.org/pdf/1706.03762.pdf
curl -L -o 1810.04805.pdf https://arxiv.org/pdf/1810.04805.pdf

Verify that the PDFs downloaded correctly:

file 1706.03762.pdf 1810.04805.pdf

You should see output indicating these are PDF documents, not HTML files.

Install the necessary packages for your chosen database:

Python
TypeScript

letta-client
chromadb
pypdf
python-dotenv

npm install @letta-ai/letta-client chromadb @chroma-core/default-embed dotenv pdf-ts
npm install --save-dev typescript @types/node tsx

TypeScript installation issue: If you encounter errors during installation (particularly with the sharp dependency), try installing with prebuilt binaries:

rm -rf node_modules package-lock.json
npm install @letta-ai/letta-client chromadb @chroma-core/default-embed dotenv pdf-ts sharp --ignore-scripts
npm install --save-dev typescript @types/node ts-node tsx

For Python, install the packages with the following command:

pip install -r requirements.txt

Python
TypeScript

letta-client
pymongo
pypdf
python-dotenv
requests
certifi
dnspython

npm install @letta-ai/letta-client mongodb dotenv pdf-ts node-fetch
npm install --save-dev typescript @types/node tsx

For Python, install the packages with the following command:

pip install -r requirements.txt

Python
TypeScript

letta-client
qdrant-client
pypdf
python-dotenv
requests

npm install @letta-ai/letta-client @qdrant/js-client-rest dotenv node-fetch pdf-ts
npm install --save-dev typescript @types/node tsx

For Python, install the packages with the following command:

pip install -r requirements.txt

Now create a setup.py file (Python) or a setup.ts file (TypeScript) to load the PDFs, split them into chunks, and ingest them into your database:

Python
TypeScript

import os
import chromadb
import pypdf
from dotenv import load_dotenv

load_dotenv()

def main():
    # Connect to ChromaDB Cloud
    client = chromadb.CloudClient(
        tenant=os.getenv("CHROMA_TENANT"),
        database=os.getenv("CHROMA_DATABASE"),
        api_key=os.getenv("CHROMA_API_KEY")
    )

    # Create or get the collection
    collection = client.get_or_create_collection("rag_collection")

    # Ingest PDFs
    pdf_files = ["1706.03762.pdf", "1810.04805.pdf"]
    for pdf_file in pdf_files:
        print(f"Ingesting {pdf_file}...")
        reader = pypdf.PdfReader(pdf_file)
        for i, page in enumerate(reader.pages):
            collection.add(
                ids=[f"{pdf_file}-{i}"],
                documents=[page.extract_text()]
            )

    print("\nIngestion complete!")
    print(f"Total documents in collection: {collection.count()}")

if __name__ == "__main__":
    main()

import { CloudClient } from 'chromadb';
import { DefaultEmbeddingFunction } from '@chroma-core/default-embed';
import * as dotenv from 'dotenv';
import * as path from 'path';
import * as fs from 'fs';
import { pdfToPages } from 'pdf-ts';
import { fileURLToPath } from 'url';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

dotenv.config();

async function main() {
    // Connect to ChromaDB Cloud
    const client = new CloudClient({
        apiKey: process.env.CHROMA_API_KEY || '',
        tenant: process.env.CHROMA_TENANT || '',
        database: process.env.CHROMA_DATABASE || ''
    });

    // Create embedding function
    const embedder = new DefaultEmbeddingFunction();

    // Create or get the collection
    const collection = await client.getOrCreateCollection({
        name: 'rag_collection',
        embeddingFunction: embedder
    });

    // Ingest PDFs
    const pdfFiles = ['1706.03762.pdf', '1810.04805.pdf'];

    for (const pdfFile of pdfFiles) {
        console.log(`Ingesting ${pdfFile}...`);
        const pdfPath = path.join(__dirname, pdfFile);
        const dataBuffer = fs.readFileSync(pdfPath);

        const pages = await pdfToPages(dataBuffer);

        for (let i = 0; i < pages.length; i++) {
            const text = pages[i].text.trim();
            if (text) {
                await collection.add({
                    ids: [`${pdfFile}-${i}`],
                    documents: [text]
                });
            }
        }
    }

    console.log('\nIngestion complete!');
    const count = await collection.count();
    console.log(`Total documents in collection: ${count}`);
}

main().catch(console.error);

Python
TypeScript

import os
import pymongo
import pypdf
import requests
import certifi
from dotenv import load_dotenv

load_dotenv()

def get_embedding(text, api_key):
    """Get embedding from Hugging Face Inference API"""
    API_URL = "https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5"
    headers = {"Authorization": f"Bearer {api_key}"}

    response = requests.post(API_URL, headers=headers, json={"inputs": [text], "options": {"wait_for_model": True}})

    if response.status_code == 200:
        return response.json()[0]
    else:
        raise Exception(f"HF API error: {response.status_code} - {response.text}")

def main():
    hf_api_key = os.getenv("HF_API_KEY")
    mongodb_uri = os.getenv("MONGODB_URI")
    db_name = os.getenv("MONGODB_DB_NAME")

    if not all([hf_api_key, mongodb_uri, db_name]):
        print("Error: Ensure HF_API_KEY, MONGODB_URI, and MONGODB_DB_NAME are in .env file")
        return

    # Connect to MongoDB Atlas using certifi
    client = pymongo.MongoClient(mongodb_uri, tlsCAFile=certifi.where())
    db = client[db_name]
    collection = db["rag_collection"]

    # Ingest PDFs
    pdf_files = ["1706.03762.pdf", "1810.04805.pdf"]
    for pdf_file in pdf_files:
        print(f"Ingesting {pdf_file}...")
        reader = pypdf.PdfReader(pdf_file)
        for i, page in enumerate(reader.pages):
            text = page.extract_text()
            if not text: # Skip empty pages
                continue

            # Generate embedding using Hugging Face
            print(f"  Processing page {i+1}...")
            try:
                embedding = get_embedding(text, hf_api_key)
                collection.insert_one({
                    "_id": f"{pdf_file}-{i}",
                    "text": text,
                    "embedding": embedding,
                    "source": pdf_file,
                    "page": i
                })
            except Exception as e:
                print(f"    Could not process page {i+1}: {e}")


    print("\nIngestion complete!")
    print(f"Total documents in collection: {collection.count_documents({})}")

    # Create vector search index
    print("\nNext: Go to your MongoDB Atlas dashboard and create a search index named 'vector_index'")
    print("Use the following JSON definition:")
    print('''{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 384,
      "similarity": "cosine"
    }
  ]
}''')

if __name__ == "__main__":
    main()

import { MongoClient } from 'mongodb';
import * as dotenv from 'dotenv';
import { pdfToPages } from 'pdf-ts';
import * as fs from 'fs';
import fetch from 'node-fetch';

dotenv.config();

async function getEmbedding(text: string, apiKey: string): Promise<number[]> {
    const API_URL = "https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5";
    const headers = {
        "Authorization": `Bearer ${apiKey}`,
        "Content-Type": "application/json"
    };

    const response = await fetch(API_URL, {
        method: 'POST',
        headers: headers,
        body: JSON.stringify({
            inputs: [text],
            options: { wait_for_model: true }
        })
    });

    if (response.ok) {
        const result: any = await response.json();
        return result[0];
    } else {
        const errorText = await response.text();
        throw new Error(`HF API error: ${response.status} - ${errorText}`);
    }
}

async function main() {
    const hfApiKey = process.env.HF_API_KEY || '';
    const mongoUri = process.env.MONGODB_URI || '';
    const dbName = process.env.MONGODB_DB_NAME || '';

    if (!hfApiKey || !mongoUri || !dbName) {
        console.error('Error: Ensure HF_API_KEY, MONGODB_URI, and MONGODB_DB_NAME are in .env file');
        return;
    }

    // Connect to MongoDB Atlas
    const client = new MongoClient(mongoUri);

    try {
        await client.connect();
        console.log('Connected to MongoDB Atlas');

        const db = client.db(dbName);
        const collection = db.collection('rag_collection');

        // Ingest PDFs
        const pdfFiles = ['1706.03762.pdf', '1810.04805.pdf'];

        for (const pdfFile of pdfFiles) {
            console.log(`Ingesting ${pdfFile}...`);

            const dataBuffer = fs.readFileSync(pdfFile);
            const pages = await pdfToPages(dataBuffer);

            for (let i = 0; i < pages.length; i++) {
                const text = pages[i].text;

                if (!text || text.trim().length === 0) {
                    continue; // Skip empty pages
                }

                // Generate embedding using Hugging Face
                console.log(`  Processing page ${i + 1}...`);
                try {
                    const embedding = await getEmbedding(text, hfApiKey);

                    await collection.insertOne({
                        _id: `${pdfFile}-${i}`,
                        text: text,
                        embedding: embedding,
                        source: pdfFile,
                        page: i
                    });
                } catch (error) {
                    console.log(`    Could not process page ${i + 1}: ${error}`);
                }
            }
        }

        const docCount = await collection.countDocuments({});
        console.log('\nIngestion complete!');
        console.log(`Total documents in collection: ${docCount}`);

        console.log('\nNext: Go to your MongoDB Atlas dashboard and create a search index named "vector_index"');
        console.log(JSON.stringify({
            "fields": [
                {
                    "type": "vector",
                    "path": "embedding",
                    "numDimensions": 384,
                    "similarity": "cosine"
                }
            ]
        }, null, 2));

    } catch (error) {
        console.error('Error:', error);
    } finally {
        await client.close();
    }
}

main();

Python
TypeScript

import os
import pypdf
import requests
from dotenv import load_dotenv
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

load_dotenv()

def get_embedding(text, api_key):
    """Get embedding from Hugging Face Inference API"""
    API_URL = "https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5"
    headers = {"Authorization": f"Bearer {api_key}"}

    response = requests.post(API_URL, headers=headers, json={"inputs": text, "options": {"wait_for_model": True}})

    if response.status_code == 200:
        return response.json()
    else:
        raise Exception(f"HF API error: {response.status_code} - {response.text}")

def main():
    hf_api_key = os.getenv("HF_API_KEY")

    if not hf_api_key:
        print("Error: HF_API_KEY not found in .env file")
        return

    # Connect to Qdrant Cloud
    client = QdrantClient(
        url=os.getenv("QDRANT_URL"),
        api_key=os.getenv("QDRANT_API_KEY")
    )

    # Create collection
    collection_name = "rag_collection"

    # Check if collection exists, if not create it
    collections = client.get_collections().collections
    if collection_name not in [c.name for c in collections]:
        client.create_collection(
            collection_name=collection_name,
            vectors_config=VectorParams(size=384, distance=Distance.COSINE)
        )

    # Ingest PDFs
    pdf_files = ["1706.03762.pdf", "1810.04805.pdf"]
    point_id = 0

    for pdf_file in pdf_files:
        print(f"Ingesting {pdf_file}...")
        reader = pypdf.PdfReader(pdf_file)
        for i, page in enumerate(reader.pages):
            text = page.extract_text()

            # Generate embedding using Hugging Face
            print(f"  Processing page {i+1}...")
            embedding = get_embedding(text, hf_api_key)

            client.upsert(
                collection_name=collection_name,
                points=[
                    PointStruct(
                        id=point_id,
                        vector=embedding,
                        payload={"text": text, "source": pdf_file, "page": i}
                    )
                ]
            )
            point_id += 1

    print("\nIngestion complete!")
    collection_info = client.get_collection(collection_name)
    print(f"Total documents in collection: {collection_info.points_count}")

if __name__ == "__main__":
    main()

import { QdrantClient } from '@qdrant/js-client-rest';
import { pdfToPages } from 'pdf-ts';
import dotenv from 'dotenv';
import fetch from 'node-fetch';
import * as fs from 'fs';

dotenv.config();

async function getEmbedding(text: string, apiKey: string): Promise<number[]> {
    const API_URL = "https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5";

    const response = await fetch(API_URL, {
        method: 'POST',
        headers: {
            "Authorization": `Bearer ${apiKey}`,
            "Content-Type": "application/json"
        },
        body: JSON.stringify({
            inputs: [text],
            options: { wait_for_model: true }
        })
    });

    if (response.ok) {
        const result: any = await response.json();
        return result[0];
    } else {
        const error = await response.text();
        throw new Error(`HuggingFace API error: ${response.status} - ${error}`);
    }
}

async function main() {
    const hfApiKey = process.env.HF_API_KEY || '';

    if (!hfApiKey) {
        console.error('Error: HF_API_KEY not found in .env file');
        return;
    }

    // Connect to Qdrant Cloud
    const client = new QdrantClient({
        url: process.env.QDRANT_URL || '',
        apiKey: process.env.QDRANT_API_KEY || ''
    });

    const collectionName = 'rag_collection';

    // Check if collection exists, if not create it
    const collections = await client.getCollections();
    const collectionExists = collections.collections.some(c => c.name === collectionName);

    if (!collectionExists) {
        console.log('Creating collection...');
        await client.createCollection(collectionName, {
            vectors: {
                size: 384,
                distance: 'Cosine'
            }
        });
    }

    // Ingest PDFs
    const pdfFiles = ['1706.03762.pdf', '1810.04805.pdf'];
    let pointId = 0;

    for (const pdfFile of pdfFiles) {
        console.log(`\nIngesting ${pdfFile}...`);
        const dataBuffer = fs.readFileSync(pdfFile);
        const pages = await pdfToPages(dataBuffer);

        for (let i = 0; i < pages.length; i++) {
            const text = pages[i].text;

            console.log(`  Processing page ${i + 1}...`);
            const embedding = await getEmbedding(text, hfApiKey);

            await client.upsert(collectionName, {
                wait: true,
                points: [
                    {
                        id: pointId,
                        vector: embedding,
                        payload: {
                            text: text,
                            source: pdfFile,
                            page: i
                        }
                    }
                ]
            });
            pointId++;
        }
    }

    console.log('\nIngestion complete!');
    const collectionInfo = await client.getCollection(collectionName);
    console.log(`Total documents in collection: ${collectionInfo.points_count}`);
}

main().catch(console.error);

Run the script from your terminal:

Python
TypeScript

python setup.py

npx tsx setup.ts

If you’re using MongoDB Atlas, manually create a vector search index by following the steps below.

Create the vector search index (MongoDB Atlas Only)

Navigate to Atlas Search

Log in to your MongoDB Atlas dashboard, navigate to your cluster, and click on the Atlas Search tab.
Create a search index

Click Create Search Index, then choose JSON Editor (not Visual Editor).
Select the database and collection
- Database: Select rag_demo (or whatever you set as MONGODB_DB_NAME).
- Collection: Select rag_collection.
Name and configure the index
- Index Name: Enter vector_index (this exact name is required by the code).
- Copy and paste this JSON definition as the configuration:
```
{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 384,
      "similarity": "cosine"
    }
  ]
}
```
Note: The 384 dimensions are for Hugging Face’s BAAI/bge-small-en-v1.5 model.
Create and wait

Click Create Search Index. The index takes a few minutes to build. Wait until it displays an Active status before proceeding.

Your vector database is now populated with research paper content and ready to query.

Step 2: Create a custom search tool

A Letta tool is a Python function that your agent can call. We’ll create a function that searches your vector database and returns the results. Letta handles the complexities of exposing this function to the agent securely.

Create a new file named tools.py (Python) or tools.ts (TypeScript) with the appropriate implementation for your database:

Python
TypeScript

def search_research_papers(query_text: str, n_results: int = 1) -> str:
    """
    Searches the research paper collection for a given query.

    Args:
        query_text (str): The text to search for.
        n_results (int): The number of results to return.

    Returns:
        str: The most relevant document found.
    """
    import chromadb
    import os

    # ChromaDB Cloud Client
    # This tool code is executed on the Letta server. It expects the ChromaDB
    # credentials to be passed as environment variables.
    api_key = os.getenv("CHROMA_API_KEY")
    tenant = os.getenv("CHROMA_TENANT")
    database = os.getenv("CHROMA_DATABASE")

    if not all([api_key, tenant, database]):
        raise ValueError("CHROMA_API_KEY, CHROMA_TENANT, and CHROMA_DATABASE must be set as environment variables.")

    client = chromadb.CloudClient(
        tenant=tenant,
        database=database,
        api_key=api_key
    )

    collection = client.get_or_create_collection("rag_collection")

    try:
        results = collection.query(
            query_texts=[query_text],
            n_results=n_results
        )

        document = results['documents'][0][0]
        return document
    except Exception as e:
        return f"Tool failed with error: {e}"

/**
 * This file contains the Python tool code as a string.
 * Letta tools execute in Python, so we define the Python source code here.
 */

export const searchResearchPapersToolCode = `def search_research_papers(query_text: str, n_results: int = 1) -> str:
    """
    Searches the research paper collection for a given query.

    Args:
        query_text (str): The text to search for.
        n_results (int): The number of results to return.

    Returns:
        str: The most relevant document found.
    """
    import chromadb
    import os

    # ChromaDB Cloud Client
    # This tool code is executed on the Letta server. It expects the ChromaDB
    # credentials to be passed as environment variables.
    api_key = os.getenv("CHROMA_API_KEY")
    tenant = os.getenv("CHROMA_TENANT")
    database = os.getenv("CHROMA_DATABASE")

    if not all([api_key, tenant, database]):
        raise ValueError("CHROMA_API_KEY, CHROMA_TENANT, and CHROMA_DATABASE must be set as environment variables.")

    client = chromadb.CloudClient(
        tenant=tenant,
        database=database,
        api_key=api_key
    )

    collection = client.get_or_create_collection("rag_collection")

    try:
        results = collection.query(
            query_texts=[query_text],
            n_results=n_results
        )

        document = results['documents'][0][0]
        return document
    except Exception as e:
        return f"Tool failed with error: {e}"
`;

Python
TypeScript

import os

def search_research_papers(query_text: str, n_results: int = 1) -> str:
    """
    Searches the research paper collection for a given query using Hugging Face embeddings.

    Args:
        query_text (str): The text to search for.
        n_results (int): The number of results to return.

    Returns:
        str: The most relevant documents found.
    """
    import requests
    import pymongo
    import certifi

    try:
        n_results = int(n_results)
    except (ValueError, TypeError):
        n_results = 1

    mongodb_uri = os.getenv("MONGODB_URI")
    db_name = os.getenv("MONGODB_DB_NAME")
    hf_api_key = os.getenv("HF_API_KEY")

    if not all([mongodb_uri, db_name, hf_api_key]):
        raise ValueError("MONGODB_URI, MONGODB_DB_NAME, and HF_API_KEY must be set as environment variables.")

    # --- Hugging Face API Call ---
    try:
        response = requests.post(
            "https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5",
            headers={"Authorization": f"Bearer {hf_api_key}"},
            json={"inputs": [query_text], "options": {"wait_for_model": True}},
            timeout=30
        )
        response.raise_for_status()
        query_embedding = response.json()[0]
    except requests.exceptions.RequestException as e:
        return f"Hugging Face API request failed: {e}"

    # --- MongoDB Atlas Connection & Search ---
    try:
        client = pymongo.MongoClient(mongodb_uri, tlsCAFile=certifi.where(), serverSelectionTimeoutMS=30000)
        collection = client[db_name]["rag_collection"]
        pipeline = [
            {
                "$vectorSearch": {
                    "index": "vector_index",
                    "path": "embedding",
                    "queryVector": query_embedding,
                    "numCandidates": 100,
                    "limit": n_results
                }
            },
            {
                "$project": {
                    "text": 1,
                    "source": 1,
                    "page": 1,
                    "score": {"$meta": "vectorSearchScore"}
                }
            }
        ]
        results = list(collection.aggregate(pipeline))
    except pymongo.errors.PyMongoError as e:
        return f"MongoDB operation failed: {e}"

    # --- Final Processing ---
    documents = [doc.get("text", "") for doc in results]
    return "\n\n".join(documents) if documents else "No results found."

/**
 * This file contains the Python tool code as a string.
 * Letta tools execute in Python, so we define the Python source code here.
 */

export const searchResearchPapersToolCode = `import os

def search_research_papers(query_text: str, n_results: int = 1) -> str:
    """
    Searches the research paper collection for a given query using Hugging Face embeddings.

    Args:
        query_text (str): The text to search for.
        n_results (int): The number of results to return.

    Returns:
        str: The most relevant documents found.
    """
    import requests
    import pymongo
    import certifi

    try:
        n_results = int(n_results)
    except (ValueError, TypeError):
        n_results = 1

    mongodb_uri = os.getenv("MONGODB_URI")
    db_name = os.getenv("MONGODB_DB_NAME")
    hf_api_key = os.getenv("HF_API_KEY")

    if not all([mongodb_uri, db_name, hf_api_key]):
        raise ValueError("MONGODB_URI, MONGODB_DB_NAME, and HF_API_KEY must be set as environment variables.")

    # --- Hugging Face API Call ---
    try:
        response = requests.post(
            "https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5",
            headers={"Authorization": f"Bearer {hf_api_key}"},
            json={"inputs": [query_text], "options": {"wait_for_model": True}},
            timeout=30
        )
        response.raise_for_status()
        query_embedding = response.json()[0]
    except requests.exceptions.RequestException as e:
        return f"Hugging Face API request failed: {e}"

    # --- MongoDB Atlas Connection & Search ---
    try:
        client = pymongo.MongoClient(mongodb_uri, tlsCAFile=certifi.where(), serverSelectionTimeoutMS=30000)
        collection = client[db_name]["rag_collection"]
        pipeline = [
            {
                "$vectorSearch": {
                    "index": "vector_index",
                    "path": "embedding",
                    "queryVector": query_embedding,
                    "numCandidates": 100,
                    "limit": n_results
                }
            },
            {
                "$project": {
                    "text": 1,
                    "source": 1,
                    "page": 1,
                    "score": {"$meta": "vectorSearchScore"}
                }
            }
        ]
        results = list(collection.aggregate(pipeline))
    except pymongo.errors.PyMongoError as e:
        return f"MongoDB operation failed: {e}"

    # --- Final Processing ---
    documents = [doc.get("text", "") for doc in results]
    return "\\n\\n".join(documents) if documents else "No results found."
`;

Python
TypeScript

def search_research_papers(query_text: str, n_results: int = 1) -> str:
    """
    Searches the research paper collection for a given query using Hugging Face embeddings.

    Args:
        query_text (str): The text to search for.
        n_results (int): The number of results to return.

    Returns:
        str: The most relevant documents found.
    """
    import os
    import requests
    from qdrant_client import QdrantClient

    # Qdrant Cloud Client
    url = os.getenv("QDRANT_URL")
    api_key = os.getenv("QDRANT_API_KEY")
    hf_api_key = os.getenv("HF_API_KEY")

    if not all([url, api_key, hf_api_key]):
        raise ValueError("QDRANT_URL, QDRANT_API_KEY, and HF_API_KEY must be set as environment variables.")

    # Connect to Qdrant
    client = QdrantClient(url=url, api_key=api_key)

    try:
        # Generate embedding using Hugging Face
        API_URL = "https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5"
        headers = {"Authorization": f"Bearer {hf_api_key}"}
        response = requests.post(API_URL, headers=headers, json={"inputs": query_text, "options": {"wait_for_model": True}})

        if response.status_code != 200:
            return f"HF API error: {response.status_code}"

        query_embedding = response.json()

        # Search Qdrant
        results = client.query_points(
            collection_name="rag_collection",
            query=query_embedding,
            limit=n_results
        )

        documents = [hit.payload["text"] for hit in results.points]
        return "\n\n".join(documents) if documents else "No results found."
    except Exception as e:
        return f"Tool failed with error: {e}"

/**
 * This file contains the Python tool code as a string.
 * Letta tools execute in Python, so we define the Python source code here.
 */

export const searchResearchPapersToolCode = `def search_research_papers(query_text: str, n_results: int = 1) -> str:
    """
    Searches the research paper collection for a given query using Hugging Face embeddings.

    Args:
        query_text (str): The text to search for.
        n_results (int): The number of results to return.

    Returns:
        str: The most relevant documents found.
    """
    import os
    import requests
    from qdrant_client import QdrantClient

    # Qdrant Cloud Client
    url = os.getenv("QDRANT_URL")
    api_key = os.getenv("QDRANT_API_KEY")
    hf_api_key = os.getenv("HF_API_KEY")

    if not all([url, api_key, hf_api_key]):
        raise ValueError("QDRANT_URL, QDRANT_API_KEY, and HF_API_KEY must be set as environment variables.")

    # Connect to Qdrant
    client = QdrantClient(url=url, api_key=api_key)

    try:
        # Generate embedding using Hugging Face
        API_URL = "https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5"
        headers = {"Authorization": f"Bearer {hf_api_key}"}
        response = requests.post(API_URL, headers=headers, json={"inputs": query_text, "options": {"wait_for_model": True}})

        if response.status_code != 200:
            return f"HF API error: {response.status_code}"

        query_embedding = response.json()

        # Search Qdrant
        results = client.query_points(
            collection_name="rag_collection",
            query=query_embedding,
            limit=n_results
        )

        documents = [hit.payload["text"] for hit in results.points]
        return "\\n\\n".join(documents) if documents else "No results found."
    except Exception as e:
        return f"Tool failed with error: {e}"
`;

This function takes a query, connects to your database, retrieves the most relevant documents, and returns them as a single string.

Step 3: Configure an agentic research assistant

Next, we’ll create a new agent. This agent will have a specific persona that determines how it behaves. We’ll equip the agent with our new search tool, with dependencies configured programmatically.

Create a file named create_agentic_agent.py (Python) or create_agentic_agent.ts (TypeScript):

Python
TypeScript

import os
from letta_client import Letta
from dotenv import load_dotenv
from tools import search_research_papers

load_dotenv()

# Initialize the Letta client
client = Letta(api_key=os.getenv("LETTA_API_KEY"))

# Create a tool from our Python function with dependencies configured
search_tool = client.tools.create_from_function(
    func=search_research_papers,
    pip_requirements=[{"name": "chromadb"}]
)

# Define the agent's persona
persona = """You are a world-class research assistant. Your goal is to answer questions accurately by searching through a database of research papers. When a user asks a question, first use the `search_research_papers` tool to find relevant information. Then, answer the user's question based on the information returned by the tool."""

# Create the agent with the tool and environment variables
agent = client.agents.create(
    name="Agentic RAG Assistant",
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small",
    description="A smart agent that can search a vector database to answer questions.",
    memory_blocks=[
        {
            "label": "persona",
            "value": persona
        }
    ],
    tools=[search_tool.name],
    secrets={
        "CHROMA_API_KEY": os.getenv("CHROMA_API_KEY"),
        "CHROMA_TENANT": os.getenv("CHROMA_TENANT"),
        "CHROMA_DATABASE": os.getenv("CHROMA_DATABASE")
    }
)

print(f"Agent '{agent.name}' created with ID: {agent.id}")

import Letta from '@letta-ai/letta-client';
import * as dotenv from 'dotenv';
import { searchResearchPapersToolCode } from './tools.js';

dotenv.config();

async function main() {
    // Initialize the Letta client
    const client = new Letta({
        apiKey: process.env.LETTA_API_KEY || ''
    });

    // Create the tool from the Python code with dependencies configured
    const searchTool = await client.tools.create({
        source_code: searchResearchPapersToolCode,
        source_type: 'python',
        pip_requirements: [{ name: 'chromadb' }]
    });

    console.log(`Tool '${searchTool.name}' created with ID: ${searchTool.id}`);

    // Define the agent's persona
    const persona = `You are a world-class research assistant. Your goal is to answer questions accurately by searching through a database of research papers. When a user asks a question, first use the \`search_research_papers\` tool to find relevant information. Then, answer the user's question based on the information returned by the tool.`;

    // Create the agent with the tool and environment variables
    const agent = await client.agents.create({
        name: 'Agentic RAG Assistant',
        model: 'openai/gpt-4o-mini',
        embedding: 'openai/text-embedding-3-small',
        description: 'A smart agent that can search a vector database to answer questions.',
        memory_blocks: [
            {
                label: 'persona',
                value: persona
            }
        ],
        tool_ids: [searchTool.id],
        secrets: {
            CHROMA_API_KEY: process.env.CHROMA_API_KEY || '',
            CHROMA_TENANT: process.env.CHROMA_TENANT || '',
            CHROMA_DATABASE: process.env.CHROMA_DATABASE || ''
        }
    });

    console.log(`Agent '${agent.name}' created with ID: ${agent.id}`);
}

main().catch(console.error);

Python
TypeScript

import os
from letta_client import Letta
from dotenv import load_dotenv
from tools import search_research_papers

load_dotenv()

# Initialize the Letta client
client = Letta(api_key=os.getenv("LETTA_API_KEY"))

# Create a tool from our Python function with dependencies configured
search_tool = client.tools.create_from_function(
    func=search_research_papers,
    pip_requirements=[
        {"name": "pymongo"},
        {"name": "requests"},
        {"name": "certifi"},
        {"name": "dnspython"}
    ]
)

# Define the agent's persona
persona = """You are a world-class research assistant. Your goal is to answer questions accurately by searching through a database of research papers. When a user asks a question, first use the `search_research_papers` tool to find relevant information. Then, answer the user's question based on the information returned by the tool."""

# Create the agent with the tool and environment variables
agent = client.agents.create(
    name="Agentic RAG Assistant",
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small",
    description="A smart agent that can search a vector database to answer questions.",
    memory_blocks=[
        {
            "label": "persona",
            "value": persona
        }
    ],
    tools=[search_tool.name],
    secrets={
        "MONGODB_URI": os.getenv("MONGODB_URI"),
        "MONGODB_DB_NAME": os.getenv("MONGODB_DB_NAME"),
        "HF_API_KEY": os.getenv("HF_API_KEY")
    }
)

print(f"Agent '{agent.name}' created with ID: {agent.id}")

import Letta from '@letta-ai/letta-client';
import * as dotenv from 'dotenv';
import { searchResearchPapersToolCode } from './tools.js';

dotenv.config();

async function main() {
    // Initialize the Letta client
    const client = new Letta({
        apiKey: process.env.LETTA_API_KEY || ''
    });

    // Create the tool from the Python code with dependencies configured
    const searchTool = await client.tools.create({
        source_code: searchResearchPapersToolCode,
        source_type: 'python',
        pip_requirements: [
            { name: 'pymongo' },
            { name: 'requests' },
            { name: 'certifi' },
            { name: 'dnspython' }
        ]
    });

    console.log(`Tool '${searchTool.name}' created with ID: ${searchTool.id}`);

    // Define the agent's persona
    const persona = `You are a world-class research assistant. Your goal is to answer questions accurately by searching through a database of research papers. When a user asks a question, first use the \`search_research_papers\` tool to find relevant information. Then, answer the user's question based on the information returned by the tool.`;

    // Create the agent with the tool and environment variables
    const agent = await client.agents.create({
        name: 'Agentic RAG Assistant',
        model: 'openai/gpt-4o-mini',
        embedding: 'openai/text-embedding-3-small',
        description: 'A smart agent that can search a vector database to answer questions.',
        memory_blocks: [
            {
                label: 'persona',
                value: persona
            }
        ],
        tool_ids: [searchTool.id],
        secrets: {
            MONGODB_URI: process.env.MONGODB_URI || '',
            MONGODB_DB_NAME: process.env.MONGODB_DB_NAME || '',
            HF_API_KEY: process.env.HF_API_KEY || ''
        }
    });

    console.log(`Agent '${agent.name}' created with ID: ${agent.id}`);
}

main().catch(console.error);

Python
TypeScript

import os
from letta_client import Letta
from dotenv import load_dotenv
from tools import search_research_papers

load_dotenv()

# Initialize the Letta client
client = Letta(api_key=os.getenv("LETTA_API_KEY"))

# Create a tool from our Python function with dependencies configured
search_tool = client.tools.create_from_function(
    func=search_research_papers,
    pip_requirements=[
        {"name": "qdrant-client"},
        {"name": "requests"}
    ]
)

# Define the agent's persona
persona = """You are a world-class research assistant. Your goal is to answer questions accurately by searching through a database of research papers. When a user asks a question, first use the `search_research_papers` tool to find relevant information. Then, answer the user's question based on the information returned by the tool."""

# Create the agent with the tool and environment variables
agent = client.agents.create(
    name="Agentic RAG Assistant",
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small",
    description="A smart agent that can search a vector database to answer questions.",
    memory_blocks=[
        {
            "label": "persona",
            "value": persona
        }
    ],
    tools=[search_tool.name],
    secrets={
        "QDRANT_URL": os.getenv("QDRANT_URL"),
        "QDRANT_API_KEY": os.getenv("QDRANT_API_KEY"),
        "HF_API_KEY": os.getenv("HF_API_KEY")
    }
)

print(f"Agent '{agent.name}' created with ID: {agent.id}")

import Letta from '@letta-ai/letta-client';
import * as dotenv from 'dotenv';
import { searchResearchPapersToolCode } from './tools.js';

dotenv.config();

async function main() {
    // Initialize the Letta client
    const client = new Letta({
        apiKey: process.env.LETTA_API_KEY || ''
    });

    // Create the tool from the Python code with dependencies configured
    const searchTool = await client.tools.create({
        source_code: searchResearchPapersToolCode,
        source_type: 'python',
        pip_requirements: [
            { name: 'qdrant-client' },
            { name: 'requests' }
        ]
    });

    console.log(`Tool '${searchTool.name}' created with ID: ${searchTool.id}`);

    // Define the agent's persona
    const persona = `You are a world-class research assistant. Your goal is to answer questions accurately by searching through a database of research papers. When a user asks a question, first use the \`search_research_papers\` tool to find relevant information. Then, answer the user's question based on the information returned by the tool.`;

    // Create the agent with the tool and environment variables
    const agent = await client.agents.create({
        name: 'Agentic RAG Assistant',
        model: 'openai/gpt-4o-mini',
        embedding: 'openai/text-embedding-3-small',
        description: 'A smart agent that can search a vector database to answer questions.',
        memory_blocks: [
            {
                label: 'persona',
                value: persona
            }
        ],
        tool_ids: [searchTool.id],
        secrets: {
            QDRANT_URL: process.env.QDRANT_URL || '',
            QDRANT_API_KEY: process.env.QDRANT_API_KEY || '',
            HF_API_KEY: process.env.HF_API_KEY || ''
        }
    });

    console.log(`Agent '${agent.name}' created with ID: ${agent.id}`);
}

main().catch(console.error);

Run this script once to create the agent in your Letta project:

Python
TypeScript

python create_agentic_agent.py

npx tsx create_agentic_agent.ts

Your agent is now fully configured! You’ve set both the tool dependencies and environment variables programmatically.

Step 4: Let the agent lead the conversation

With the agentic setup, our client-side code becomes incredibly simple. We no longer need to worry about retrieving context. We just send the user’s raw question to the agent and let it handle the rest.

Create the agentic_rag.py or agentic_rag.ts script:

Python
TypeScript

import os
from letta_client import Letta
from dotenv import load_dotenv

load_dotenv()

# Initialize client
letta_client = Letta(api_key=os.getenv("LETTA_API_KEY"))

AGENT_ID = "your-agentic-agent-id"  # Replace with your new agent ID

def main():
    while True:
        user_query = input("\nAsk a question about the research papers: ")
        if user_query.lower() in ['exit', 'quit']:
            break

        response = letta_client.agents.messages.create(
            agent_id=AGENT_ID,
            messages=[{"role": "user", "content": user_query}]
        )

        for message in response.messages:
            if message.message_type == 'assistant_message':
                print(f"\nAgent: {message.content}")

if __name__ == "__main__":
    main()

import Letta from '@letta-ai/letta-client';
import * as dotenv from 'dotenv';
import * as readline from 'readline';

dotenv.config();

const AGENT_ID = 'your-agentic-agent-id';  // Replace with your new agent ID

async function main() {
    // Initialize client
    const client = new Letta({
        apiKey: process.env.LETTA_API_KEY || ''
    });

    const rl = readline.createInterface({
        input: process.stdin,
        output: process.stdout
    });

    const askQuestion = (query: string): Promise<string> => {
        return new Promise((resolve) => {
            rl.question(query, resolve);
        });
    };

    while (true) {
        const userQuery = await askQuestion('\nAsk a question about the research papers (or type "exit" to quit): ');

        if (userQuery.toLowerCase() === 'exit' || userQuery.toLowerCase() === 'quit') {
            rl.close();
            break;
        }

        const response = await client.agents.messages.create(AGENT_ID, {
            messages: [{ role: 'user', content: userQuery }]
        });

        for (const message of response.messages) {
            if (message.message_type === 'assistant_message') {
                console.log(`\nAgent: ${(message as any).content}`);
            }
        }
    }
}

main().catch(console.error);

When you run this script, the agent receives the question, understands from its persona that it needs to search for information, calls the search_research_papers tool, gets the context, and then formulates an answer. All the RAG logic is handled by the agent, not your application.

Next steps

Now that you’ve integrated agentic RAG with Letta, you can expand on this foundation.

Simple RAG

Learn how to manage retrieval on the client-side for complete control.

Custom tools

Explore creating more advanced custom tools for your agents.