Letta SDK Reference for AI Agents
Everything an AI coding agent needs to build effective Letta applications.
This page is optimized for AI coding tools (Claude Code, Cursor, Copilot, etc.) to understand and build on Letta.
Core Concept
Section titled “Core Concept”Letta agents are stateful services, not stateless APIs.
- Agents persist in a database with their own conversation history
- Send only the NEW user message - never the full conversation
- Memory blocks persist across all interactions
- The server manages all state - your app just sends messages
SDK Setup
Section titled “SDK Setup”See the Quickstart for detailed setup instructions.
Python
Section titled “Python”pip install letta-clientfrom letta_client import Lettaimport os
# Letta Cloudclient = Letta(api_key=os.getenv("LETTA_API_KEY"))
# Self-hostedclient = Letta(base_url="http://localhost:8283")TypeScript
Section titled “TypeScript”npm install @letta-ai/letta-clientimport Letta from "@letta-ai/letta-client";
// Letta Cloudconst client = new Letta({ apiKey: process.env.LETTA_API_KEY });
// Self-hostedconst client = new Letta({ baseUrl: "http://localhost:8283" });Creating Agents
Section titled “Creating Agents”Full guide: Agent Overview
Basic Agent
Section titled “Basic Agent”agent = client.agents.create( model="anthropic/claude-sonnet-4-5-20250929", embedding="openai/text-embedding-3-small", memory_blocks=[ {"label": "persona", "value": "I am a helpful assistant."}, {"label": "human", "value": "User preferences will be stored here."} ])const agent = await client.agents.create({ model: "anthropic/claude-sonnet-4-5-20250929", embedding: "openai/text-embedding-3-small", memory_blocks: [ { label: "persona", value: "I am a helpful assistant." }, { label: "human", value: "User preferences will be stored here." }, ],});Agent Creation Parameters
Section titled “Agent Creation Parameters”| Parameter | Type | Description |
|---|---|---|
model | string | LLM model handle (e.g., anthropic/claude-sonnet-4-5-20250929, openai/gpt-5.2) |
embedding | string | Embedding model for archival memory (e.g., openai/text-embedding-3-small) |
memory_blocks | array | Core memory blocks (always in context) |
tools | array | Tool names to attach |
tool_rules | array | Constraints on tool execution order |
context_window_limit | int | Max context size (default: 32000) |
enable_sleeptime | bool | Enable background memory processing (Sleeptime guide) |
Memory Blocks
Section titled “Memory Blocks”Full guide: Memory Blocks | Memory Overview
Memory blocks are persistent storage that agents can read and edit. They’re always visible in the agent’s context.
memory_blocks=[ { "label": "persona", "value": "I am a customer support agent for Acme Corp.", "description": "Agent identity and behavior guidelines. Do not modify." }, { "label": "human", "value": "", "description": "Store user preferences and information learned during conversation." }, { "label": "tasks", "value": "No active tasks.", "description": "Current tasks and their status. Update as tasks are completed." }]Important: The description field guides the agent on how to use the block. Always include it for custom blocks.
Shared Memory Blocks
Section titled “Shared Memory Blocks”Multiple agents can share the same memory block:
# Create a shared blockblock = client.blocks.create( label="company_policies", value="Return policy: 30 days with receipt...", description="Company policies. Read-only reference.")
# Attach to multiple agentsclient.agents.blocks.attach(agent_id=agent1.id, block_id=block.id)client.agents.blocks.attach(agent_id=agent2.id, block_id=block.id)Sending Messages
Section titled “Sending Messages”Full guide: Messages
Basic Message
Section titled “Basic Message”response = client.agents.messages.create( agent_id=agent.id, messages=[{"role": "user", "content": "Hello!"}])
# Shorthandresponse = client.agents.messages.create( agent_id=agent.id, input="Hello!")const response = await client.agents.messages.create(agent.id, { messages: [{ role: "user", content: "Hello!" }],});
// Shorthandconst response = await client.agents.messages.create(agent.id, { input: "Hello!",});Response Handling
Section titled “Response Handling”for msg in response.messages: if msg.message_type == "assistant_message": print(msg.content) # The actual response text elif msg.message_type == "reasoning_message": print(f"Thinking: {msg.reasoning}") elif msg.message_type == "tool_call_message": print(f"Called: {msg.tool_call.name}") elif msg.message_type == "tool_return_message": print(f"Result: {msg.tool_return}")Message types:
assistant_message- The user-visible response (use.content)reasoning_message- Agent’s internal reasoningtool_call_message- Tool invocationtool_return_message- Tool execution resultusage_statistics- Token usage info
Streaming
Section titled “Streaming”Full guide: Streaming
# Step streaming (complete messages)stream = client.agents.messages.create( agent_id=agent.id, input="Hello!", streaming=True)
for chunk in stream: if chunk.message_type == "assistant_message": print(chunk.content, end="", flush=True)
# Token streaming (character by character)stream = client.agents.messages.create( agent_id=agent.id, input="Hello!", streaming=True, stream_tokens=True)// Step streamingconst stream = await client.agents.messages.stream(agent.id, { input: "Hello!",});
for await (const chunk of stream) { if (chunk.message_type === "assistant_message") { process.stdout.write(chunk.content); }}
// Token streamingconst tokenStream = await client.agents.messages.stream(agent.id, { input: "Hello!", stream_tokens: true,});Memory Tools
Section titled “Memory Tools”Full guide: Memory
Agents have built-in tools to manage their memory:
| Tool | Purpose |
|---|---|
memory | Unified tool for create/edit/delete/rename blocks |
memory_insert | Insert text at a specific line |
memory_replace | Replace exact text match |
memory_rethink | Completely rewrite a block |
archival_memory_insert | Store in long-term archival memory |
archival_memory_search | Search archival memory |
conversation_search | Search past conversation history |
Archival Memory
Section titled “Archival Memory”Full guide: Archival Memory
For large amounts of data, use archival memory (vector database storage):
# Agent can use these tools automatically, or you can insert directly:client.agents.archival.create( agent_id=agent.id, content="Important fact to remember long-term...")
# Search archival memoryresults = client.agents.archival.list( agent_id=agent.id, query="search query", limit=10)Custom Tools
Section titled “Custom Tools”Full guide: Custom Tools | Tool Variables
Function-Based (Recommended)
Section titled “Function-Based (Recommended)”def get_weather(location: str) -> str: """Get the current weather for a location.
Args: location: City name or zip code
Returns: Current weather conditions """ # Implementation here return f"Weather in {location}: 72°F, sunny"
# Create the tooltool = client.tools.create(func=get_weather)
# Attach to agentagent = client.agents.create( model="anthropic/claude-sonnet-4-5-20250929", tools=[tool.name], # ... other params)Tool Environment Variables
Section titled “Tool Environment Variables”Pass secrets to tools without exposing them in code:
agent = client.agents.create( model="anthropic/claude-sonnet-4-5-20250929", tools=["my_api_tool"], secrets={"API_KEY": "sk-..."} # Available as os.getenv("API_KEY") in tool)Client Injection (Advanced)
Section titled “Client Injection (Advanced)”Tools on Letta Cloud get automatic access to a pre-initialized client:
def update_memory(label: str, content: str) -> str: """Update a memory block.""" import os # `client` is pre-injected, no initialization needed client.agents.blocks.update( agent_id=os.getenv("LETTA_AGENT_ID"), block_label=label, value=content ) return "Memory updated"Tool Rules
Section titled “Tool Rules”Full guide: Tool Rules
Constrain tool execution order:
tool_rules=[ {"tool_name": "final_answer", "type": "exit_loop"}, # Ends agent execution {"tool_name": "search", "type": "run_first"}, # Must run first]Rule types:
exit_loop- Tool ends agent execution (terminal)run_first- Tool must be called firstcontinue- Agent must continue after this tool
For complex workflows with child/parent relationships, use typed classes:
from letta_client.types import ChildToolRule, TerminalToolRule
tool_rules=[ ChildToolRule(tool_name="search", children=["search", "summarize"]), TerminalToolRule(tool_name="final_answer"),]Common Patterns
Section titled “Common Patterns”One Agent Per User
Section titled “One Agent Per User”Full guide: Multi-User Agents
def get_or_create_agent(user_id: str): # Check for existing agent agents = client.agents.list(tags=[f"user:{user_id}"]) if agents: return agents[0]
# Create new agent for user return client.agents.create( model="anthropic/claude-sonnet-4-5-20250929", tags=[f"user:{user_id}"], memory_blocks=[ {"label": "persona", "value": "..."}, {"label": "human", "value": f"User ID: {user_id}"} ] )Multi-Agent with Shared Memory
Section titled “Multi-Agent with Shared Memory”# Create shared knowledge blockknowledge = client.blocks.create( label="shared_knowledge", value="Facts all agents should know...")
# Both agents see the same blockagent1 = client.agents.create(...)agent2 = client.agents.create(...)
client.agents.blocks.attach(agent_id=agent1.id, block_id=knowledge.id)client.agents.blocks.attach(agent_id=agent2.id, block_id=knowledge.id)
# When agent1 updates it, agent2 sees the changeAnti-Patterns (Avoid These)
Section titled “Anti-Patterns (Avoid These)”| Don’t | Do Instead |
|---|---|
| Send full conversation history each message | Send only the new message - agent maintains history |
| Create a new agent per conversation | Reuse agents - they’re persistent services |
| Store large documents in memory blocks | Use archival memory for large content |
Skip description on custom blocks | Always describe how the agent should use each block |
Use .text on messages | Use .content for message text |
Call client.agents.chat() | Use client.agents.messages.create() |
Model Recommendations
Section titled “Model Recommendations”| Use Case | Recommended Model |
|---|---|
| Complex reasoning, agentic tasks | anthropic/claude-sonnet-4-5-20250929 or openai/gpt-5.2 |
| Cost-efficient general tasks | openai/gpt-4o-mini |
| Fast, lightweight | anthropic/claude-haiku-4-5 |
Avoid: Small local models (under 7B params) for tool-heavy agents - they struggle with function calling.
Quick Reference
Section titled “Quick Reference”Agent Lifecycle
Section titled “Agent Lifecycle”# Createagent = client.agents.create(...)
# Getagent = client.agents.retrieve(agent_id)
# Updateclient.agents.modify(agent_id, model="openai/gpt-5.2")
# Deleteclient.agents.delete(agent_id)
# Listagents = client.agents.list(tags=["production"])Memory Block Operations
Section titled “Memory Block Operations”# List agent's blocksblocks = client.agents.blocks.list(agent_id)
# Update block contentclient.agents.blocks.update(agent_id, block_label="human", value="New content")
# Attach existing blockclient.agents.blocks.attach(agent_id, block_id=block.id)
# Detach blockclient.agents.blocks.detach(agent_id, block_id=block.id)Message History
Section titled “Message History”# Get conversation historymessages = client.agents.messages.list(agent_id, limit=100)
# Search historyresults = client.agents.messages.search(agent_id, query="deployment")Resources
Section titled “Resources”Page-specific markdown
Section titled “Page-specific markdown”Append /index.md to any docs URL for LLM-friendly markdown:
https://docs.letta.com/quickstart→https://docs.letta.com/quickstart/index.md