Skip to content
Sign up

Letta SDK Reference for AI Agents

Everything an AI coding agent needs to build effective Letta applications.

This page is optimized for AI coding tools (Claude Code, Cursor, Copilot, etc.) to understand and build on Letta.

Letta agents are stateful services, not stateless APIs.

  • Agents persist in a database with their own conversation history
  • Send only the NEW user message - never the full conversation
  • Memory blocks persist across all interactions
  • The server manages all state - your app just sends messages

See the Quickstart for detailed setup instructions.

pip install letta-client
from letta_client import Letta
import os
# Letta Cloud
client = Letta(api_key=os.getenv("LETTA_API_KEY"))
# Self-hosted
client = Letta(base_url="http://localhost:8283")
Terminal window
npm install @letta-ai/letta-client
import Letta from "@letta-ai/letta-client";
// Letta Cloud
const client = new Letta({ apiKey: process.env.LETTA_API_KEY });
// Self-hosted
const client = new Letta({ baseUrl: "http://localhost:8283" });

Full guide: Agent Overview

agent = client.agents.create(
model="anthropic/claude-sonnet-4-5-20250929",
embedding="openai/text-embedding-3-small",
memory_blocks=[
{"label": "persona", "value": "I am a helpful assistant."},
{"label": "human", "value": "User preferences will be stored here."}
]
)
ParameterTypeDescription
modelstringLLM model handle (e.g., anthropic/claude-sonnet-4-5-20250929, openai/gpt-5.2)
embeddingstringEmbedding model for archival memory (e.g., openai/text-embedding-3-small)
memory_blocksarrayCore memory blocks (always in context)
toolsarrayTool names to attach
tool_rulesarrayConstraints on tool execution order
context_window_limitintMax context size (default: 32000)
enable_sleeptimeboolEnable background memory processing (Sleeptime guide)

Full guide: Memory Blocks | Memory Overview

Memory blocks are persistent storage that agents can read and edit. They’re always visible in the agent’s context.

memory_blocks=[
{
"label": "persona",
"value": "I am a customer support agent for Acme Corp.",
"description": "Agent identity and behavior guidelines. Do not modify."
},
{
"label": "human",
"value": "",
"description": "Store user preferences and information learned during conversation."
},
{
"label": "tasks",
"value": "No active tasks.",
"description": "Current tasks and their status. Update as tasks are completed."
}
]

Important: The description field guides the agent on how to use the block. Always include it for custom blocks.

Multiple agents can share the same memory block:

# Create a shared block
block = client.blocks.create(
label="company_policies",
value="Return policy: 30 days with receipt...",
description="Company policies. Read-only reference."
)
# Attach to multiple agents
client.agents.blocks.attach(agent_id=agent1.id, block_id=block.id)
client.agents.blocks.attach(agent_id=agent2.id, block_id=block.id)

Full guide: Messages

response = client.agents.messages.create(
agent_id=agent.id,
messages=[{"role": "user", "content": "Hello!"}]
)
# Shorthand
response = client.agents.messages.create(
agent_id=agent.id,
input="Hello!"
)
for msg in response.messages:
if msg.message_type == "assistant_message":
print(msg.content) # The actual response text
elif msg.message_type == "reasoning_message":
print(f"Thinking: {msg.reasoning}")
elif msg.message_type == "tool_call_message":
print(f"Called: {msg.tool_call.name}")
elif msg.message_type == "tool_return_message":
print(f"Result: {msg.tool_return}")

Message types:

  • assistant_message - The user-visible response (use .content)
  • reasoning_message - Agent’s internal reasoning
  • tool_call_message - Tool invocation
  • tool_return_message - Tool execution result
  • usage_statistics - Token usage info

Full guide: Streaming

# Step streaming (complete messages)
stream = client.agents.messages.create(
agent_id=agent.id,
input="Hello!",
streaming=True
)
for chunk in stream:
if chunk.message_type == "assistant_message":
print(chunk.content, end="", flush=True)
# Token streaming (character by character)
stream = client.agents.messages.create(
agent_id=agent.id,
input="Hello!",
streaming=True,
stream_tokens=True
)

Full guide: Memory

Agents have built-in tools to manage their memory:

ToolPurpose
memoryUnified tool for create/edit/delete/rename blocks
memory_insertInsert text at a specific line
memory_replaceReplace exact text match
memory_rethinkCompletely rewrite a block
archival_memory_insertStore in long-term archival memory
archival_memory_searchSearch archival memory
conversation_searchSearch past conversation history

Full guide: Archival Memory

For large amounts of data, use archival memory (vector database storage):

# Agent can use these tools automatically, or you can insert directly:
client.agents.archival.create(
agent_id=agent.id,
content="Important fact to remember long-term..."
)
# Search archival memory
results = client.agents.archival.list(
agent_id=agent.id,
query="search query",
limit=10
)

Full guide: Custom Tools | Tool Variables

def get_weather(location: str) -> str:
"""Get the current weather for a location.
Args:
location: City name or zip code
Returns:
Current weather conditions
"""
# Implementation here
return f"Weather in {location}: 72°F, sunny"
# Create the tool
tool = client.tools.create(func=get_weather)
# Attach to agent
agent = client.agents.create(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[tool.name],
# ... other params
)

Pass secrets to tools without exposing them in code:

agent = client.agents.create(
model="anthropic/claude-sonnet-4-5-20250929",
tools=["my_api_tool"],
secrets={"API_KEY": "sk-..."} # Available as os.getenv("API_KEY") in tool
)

Tools on Letta Cloud get automatic access to a pre-initialized client:

def update_memory(label: str, content: str) -> str:
"""Update a memory block."""
import os
# `client` is pre-injected, no initialization needed
client.agents.blocks.update(
agent_id=os.getenv("LETTA_AGENT_ID"),
block_label=label,
value=content
)
return "Memory updated"

Full guide: Tool Rules

Constrain tool execution order:

tool_rules=[
{"tool_name": "final_answer", "type": "exit_loop"}, # Ends agent execution
{"tool_name": "search", "type": "run_first"}, # Must run first
]

Rule types:

  • exit_loop - Tool ends agent execution (terminal)
  • run_first - Tool must be called first
  • continue - Agent must continue after this tool

For complex workflows with child/parent relationships, use typed classes:

from letta_client.types import ChildToolRule, TerminalToolRule
tool_rules=[
ChildToolRule(tool_name="search", children=["search", "summarize"]),
TerminalToolRule(tool_name="final_answer"),
]

Full guide: Multi-User Agents

def get_or_create_agent(user_id: str):
# Check for existing agent
agents = client.agents.list(tags=[f"user:{user_id}"])
if agents:
return agents[0]
# Create new agent for user
return client.agents.create(
model="anthropic/claude-sonnet-4-5-20250929",
tags=[f"user:{user_id}"],
memory_blocks=[
{"label": "persona", "value": "..."},
{"label": "human", "value": f"User ID: {user_id}"}
]
)
# Create shared knowledge block
knowledge = client.blocks.create(
label="shared_knowledge",
value="Facts all agents should know..."
)
# Both agents see the same block
agent1 = client.agents.create(...)
agent2 = client.agents.create(...)
client.agents.blocks.attach(agent_id=agent1.id, block_id=knowledge.id)
client.agents.blocks.attach(agent_id=agent2.id, block_id=knowledge.id)
# When agent1 updates it, agent2 sees the change
Don’tDo Instead
Send full conversation history each messageSend only the new message - agent maintains history
Create a new agent per conversationReuse agents - they’re persistent services
Store large documents in memory blocksUse archival memory for large content
Skip description on custom blocksAlways describe how the agent should use each block
Use .text on messagesUse .content for message text
Call client.agents.chat()Use client.agents.messages.create()
Use CaseRecommended Model
Complex reasoning, agentic tasksanthropic/claude-sonnet-4-5-20250929 or openai/gpt-5.2
Cost-efficient general tasksopenai/gpt-4o-mini
Fast, lightweightanthropic/claude-haiku-4-5

Avoid: Small local models (under 7B params) for tool-heavy agents - they struggle with function calling.

# Create
agent = client.agents.create(...)
# Get
agent = client.agents.retrieve(agent_id)
# Update
client.agents.modify(agent_id, model="openai/gpt-5.2")
# Delete
client.agents.delete(agent_id)
# List
agents = client.agents.list(tags=["production"])
# List agent's blocks
blocks = client.agents.blocks.list(agent_id)
# Update block content
client.agents.blocks.update(agent_id, block_label="human", value="New content")
# Attach existing block
client.agents.blocks.attach(agent_id, block_id=block.id)
# Detach block
client.agents.blocks.detach(agent_id, block_id=block.id)
# Get conversation history
messages = client.agents.messages.list(agent_id, limit=100)
# Search history
results = client.agents.messages.search(agent_id, query="deployment")

Append /index.md to any docs URL for LLM-friendly markdown:

  • https://docs.letta.com/quickstarthttps://docs.letta.com/quickstart/index.md