Skip to content
  • Auto
  • Light
  • Dark
DiscordForumGitHubSign up
Experimental
Voice agents
View as Markdown
Copy Markdown

Open in Claude
Open in ChatGPT

Building a Voice Agent with Letta and Vapi

A complete guide to creating a voice-enabled AI agent using Letta for conversational AI and Vapi for voice infrastructure.

This guide will show you how to:

  • Create a conversational AI agent with Letta
  • Connect it to Vapi for voice capabilities
  • Make phone calls and web-based voice interactions with your agent

Architecture:

graph LR
    A[User Voice/Phone] --> B[Vapi]
    B --> C[Letta Agent]
    C --> D[Response]
    D --> B
    B --> A
  1. Letta Account - app.letta.com (free tier available)
  2. Vapi Account - vapi.ai (paid service, ~$0.05/minute)
  • Python 3.8+ installed
  • Terminal/command line access
  • Text editor or IDE

Step 1: Set Up Your Development Environment

Section titled “Step 1: Set Up Your Development Environment”
Terminal window
mkdir letta-voice-agent
cd letta-voice-agent
Terminal window
# Create virtual environment
python3 -m venv venv
# Activate it
source venv/bin/activate

Create requirements.txt:

letta-client>=0.1.319
python-dotenv>=1.0.0
requests>=2.31.0

Install packages:

Terminal window
pip install -r requirements.txt

Create .env:

Terminal window
# Letta API Configuration
LETTA_API_KEY=your_letta_api_key_here
# Vapi API Configuration
VAPI_API_KEY=your_vapi_private_key_here
# Will be filled in later
LETTA_AGENT_ID=
VAPI_ASSISTANT_ID=

Create .gitignore:

.env
venv/
__pycache__/
*.pyc
  1. Go to app.letta.com/settings
  2. Navigate to API Keys tab
  3. Click Create New Key (or copy existing key)
  4. Copy the key and add it to .env:
Terminal window
LETTA_API_KEY=sk-your-actual-key-here

Create create_agent.py:

#!/usr/bin/env python3
"""
Create a Letta agent optimized for voice conversations.
"""
import os
from dotenv import load_dotenv
from letta_client import Letta
# Load environment variables
load_dotenv()
# Initialize Letta client
client = Letta(token=os.getenv('LETTA_API_KEY'))
print("Creating Letta agent...")
# Create the agent
agent = client.agents.create(
name="Voice Assistant",
# Memory blocks define the agent's context
memory_blocks=[
{
"label": "human",
"value": "Name: Unknown\nPreferences: Unknown"
},
{
"label": "persona",
"value": """You are a helpful AI assistant with voice capabilities.
You speak naturally and conversationally.
Keep responses concise and clear for voice interactions.
Avoid using special characters, markdown, or formatting that doesn't translate well to speech."""
}
],
# Model configuration
model="openai/gpt-4o-mini",
# Note: embedding config is only needed for self-hosted
# Letta Cloud handles this automatically
)
print(f"\n✅ Agent created successfully!")
print(f"Agent ID: {agent.id}")
print(f"Name: {agent.name}")
print(f"\n📝 Add this to your .env file:")
print(f"LETTA_AGENT_ID={agent.id}")
# Test the agent
print(f"\n🧪 Testing agent...")
response = client.agents.messages.create(
agent_id=agent.id,
messages=[{"role": "user", "content": "Hello! Introduce yourself briefly in one sentence."}]
)
# Display response
for message in response.messages:
if message.message_type == "assistant_message":
print(f"\nAgent: {message.content}")
print(f"\n✅ Agent is working! Ready for voice integration.")

Run it:

Terminal window
python create_agent.py

Copy the Agent ID and add it to .env

  1. Go to app.letta.com
  2. Click Create Agent
  3. Configure:
    • Name: Voice Assistant
    • Model: gpt-4o-mini (recommended for cost/performance)
    • Memory: Add persona and human blocks as shown above
  4. Click Create
  5. Copy the Agent ID from the URL (format: agent-xxxxxxxxx)
  6. Add to .env:
Terminal window
LETTA_AGENT_ID=agent-your-id-here
  1. Go to dashboard.vapi.ai
  2. Navigate to SettingsAPI Keys
  3. Copy your Private Key (for server-side operations)
  4. Add to .env:
Terminal window
VAPI_API_KEY=your_vapi_private_key_here
  1. Go to dashboard.vapi.ai/settings/integrations
  2. Scroll to Custom LLM section
  3. Enter your Letta API key in the field
  4. Click Save
Vapi custom LLM configuration
  1. Go to dashboard.vapi.ai/assistants

  2. Click Create AssistantBlank Template

  3. Configure the assistant:

    Model Settings:

    • Provider: Custom LLM
    • Endpoint: https://api.letta.com/v1/chat/completions
    • Model: agent-YOUR_AGENT_ID (replace with your actual agent ID)

    Voice Settings:

    • Provider: Vapi (or ElevenLabs for higher quality)
    • Voice: Kylie (or browse other options)

    Transcriber:

    • Provider: Deepgram
    • Model: nova-2
    • Language: en

    First Message:

    • Mode: Assistant speaks first
    • Message: Hello! How can I help you today?
  4. Click Save

  5. Copy the Assistant ID and add to .env

Vapi model configuration with Letta

The easiest way to test is directly in the Vapi dashboard:

  1. Go to your assistant in dashboard.vapi.ai/assistants
  2. Click the Talk button in the top right
  3. Allow microphone access when prompted
  4. Start speaking to test your agent
  1. Get a Phone Number (optional, costs ~$2/month):

    • In Vapi dashboard, go to Phone Numbers
    • Click Buy Number
    • Select a number and complete purchase
  2. Assign Assistant to Number:

    • Click on your phone number
    • Select your assistant from dropdown
    • Click Save
  3. Make a Test Call:

    • Call the phone number
    • Talk with your agent!

For embedding voice in your web app:

<!DOCTYPE html>
<html>
<head>
<title>Voice Agent Test</title>
</head>
<body>
<button id="start-call">Start Voice Call</button>
<button id="end-call" disabled>End Call</button>
<script src="https://cdn.jsdelivr.net/npm/@vapi-ai/web@latest/dist/index.js"></script>
<script>
const vapi = new Vapi('YOUR_VAPI_PUBLIC_KEY');
const startButton = document.getElementById('start-call');
const endButton = document.getElementById('end-call');
startButton.addEventListener('click', async () => {
await vapi.start('YOUR_ASSISTANT_ID');
startButton.disabled = true;
endButton.disabled = false;
});
endButton.addEventListener('click', () => {
vapi.stop();
startButton.disabled = false;
endButton.disabled = true;
});
vapi.on('call-start', () => {
console.log('Call started');
});
vapi.on('call-end', () => {
console.log('Call ended');
startButton.disabled = false;
endButton.disabled = true;
});
vapi.on('message', (message) => {
console.log('Message:', message);
});
</script>
</body>
</html>

You can modify your agent’s behavior by updating its memory blocks:

from letta_client import Letta
import os
from dotenv import load_dotenv
load_dotenv()
client = Letta(token=os.getenv('LETTA_API_KEY'))
agent_id = os.getenv('LETTA_AGENT_ID')
# Get current memory blocks
blocks = client.agents.blocks.list(agent_id=agent_id)
# Find the persona block
persona_block = next(b for b in blocks if b.label == "persona")
# Update it
client.agents.blocks.update(
agent_id=agent_id,
block_id=persona_block.id,
value="""You are a helpful tutor specializing in physics.
Keep explanations clear and concise for voice conversations.
Use analogies and real-world examples to make concepts accessible.
Ask clarifying questions when needed."""
)
print("✅ Persona updated!")

In the Vapi dashboard or via API:

Available Voice Providers:

  • Vapi - Fast, low latency (included)
  • ElevenLabs - High quality, natural ($0.30/1K chars additional)
  • Azure - Microsoft voices
  • PlayHT - Wide variety of voices
  • Deepgram - Ultra-fast, good quality

To change voice in the dashboard:

  1. Go to your assistant
  2. Click Edit
  3. Scroll to Voice section
  4. Select provider and voice
  5. Click Save

Give your agent capabilities like web search or data lookup:

# When creating agent, add tools parameter
agent = client.agents.create(
name="Voice Assistant with Tools",
memory_blocks=[...],
model="openai/gpt-4o-mini",
tools=["web_search", "archival_memory_search"]
)

If you’re running your own Letta server, you can still use Vapi:

  1. Set up ngrok (to expose localhost):

    Terminal window
    # Install ngrok from ngrok.com
    ngrok config add-authtoken YOUR_NGROK_TOKEN
    ngrok http http://localhost:8283

    Copy the forwarding URL (e.g., https://abc123.ngrok.app)

  2. Configure Vapi Assistant:

    • Model endpoint: https://abc123.ngrok.app/v1/chat/completions
    • Model: agent-YOUR_AGENT_ID
    • Add your Letta server auth token (if using password protection)
  3. Test the connection:

    • Use the Talk feature in Vapi dashboard
    • Monitor your Letta server logs for incoming requests

Issue: “pipeline-error-custom-llm-llm-failed”

Section titled “Issue: “pipeline-error-custom-llm-llm-failed””

Cause: API key not set in Vapi or incorrect endpoint

Solution:

  1. Verify Letta API key is set in Vapi integrations
  2. Check endpoint is exactly: https://api.letta.com/v1/chat/completions
  3. Verify agent ID format is agent-xxxxxxxxx (not just the ID)
  4. Test agent directly in Letta dashboard to confirm it works

Issue: “400-bad-request-validation-failed”

Section titled “Issue: “400-bad-request-validation-failed””

Cause: First message configuration error

Solution: Change first message mode to “assistant-speaks-first” with a static message:

  • Set mode: Assistant speaks first
  • Set message: Hello! How can I help you today?
  • Do NOT use “assistant-speaks-first-with-model-generated-message” (causes issues)

Cause: Microphone picking up speaker output

Solution:

  • Always use headphones when testing
  • Enable echo cancellation in Vapi settings
  • For phone calls, this is handled automatically by the phone network

Solution:

  1. Test Letta agent directly first:
from letta_client import Letta
import os
from dotenv import load_dotenv
load_dotenv()
client = Letta(token=os.getenv('LETTA_API_KEY'))
response = client.agents.messages.create(
agent_id=os.getenv('LETTA_AGENT_ID'),
messages=[{"role": "user", "content": "Hello"}]
)
for msg in response.messages:
if msg.message_type == "assistant_message":
print(msg.content)
  1. Check Vapi call logs at dashboard.vapi.ai/logs
  2. Verify your agent has appropriate tools and permissions

Solutions:

  • Use gpt-4o-mini instead of gpt-4 (much faster, lower cost)
  • Keep agent context/memory blocks concise
  • Avoid complex tool chains
  • Use Vapi’s built-in voices (lower latency than ElevenLabs)
  • Consider Deepgram Aura for ultra-low-latency TTS

Voice agents can get expensive quickly. Here’s how to keep costs down:

  • gpt-4o-mini: ~$0.15/1M input tokens (recommended)
  • gpt-4o: ~$2.50/1M input tokens (use only if needed)
  • gpt-4-turbo: ~$10/1M input tokens (avoid for voice)
  • ~$0.05/minute for Vapi base (transcription + basic TTS)
  • +$0.30/1K characters for ElevenLabs voices
  • +$2/month for phone numbers
  1. Set max_tokens to reasonable limits (150-300 for voice)
  2. Keep context windows small (clear history periodically)
  3. Use Vapi’s built-in voices instead of premium providers
  4. Monitor usage in both Letta and Vapi dashboards
  5. Implement conversation timeouts
  6. Use streaming (automatic with Vapi)