Agent memory & architecture
Letta agents solve the context window limitation of LLMs through context engineering across two tiers of memory: in-context (core) memory (including system instructions, read-write memory blocks, and conversation history), and out-of-context memory (older evicted conversation history and archival storage).
To learn more about the research origins, read the MemGPT research paper, or take the free LLM OS course on DeepLearning.ai.
Memory Hierarchy
Section titled “Memory Hierarchy”graph LR
subgraph CONTEXT[Context Window]
SYS[System Instructions]
CORE[Memory Blocks]
MSGS[Messages]
end
RECALL[Recall Memory]
ARCH[Archival Memory]
CONTEXT <--> RECALL
CONTEXT <--> ARCH
In-context (core) memory
Section titled “In-context (core) memory”Your agent’s context window contains:
- System instructions: Your agent’s base behavior and capabilities
- Memory blocks: Persistent, always-visible information (persona, user info, working state, etc.)
- Recent messages: Latest conversation history
Out-of-context memory
Section titled “Out-of-context memory”When the context window fills up:
- Recall memory: Older messages searchable via
conversation_searchtool - Archival memory: Long-term semantic storage searchable via
archival_memory_searchtool
Agent Architecture
Section titled “Agent Architecture”Letta’s agent architecture follows modern LLM patterns:
- Native reasoning: Uses model’s built-in reasoning capabilities (Responses API for OpenAI, encrypted reasoning for other providers)
- Direct messaging: Agents respond with assistant messages
- Compatibility: Works with any LLM, tool calling not required
- Self-directed termination: Agents decide when to continue or stop
This architecture is optimized for frontier models like GPT-5 and Claude Sonnet 4.5.
Learn more about the architecture evolution →
Memory Tools
Section titled “Memory Tools”Letta agents have tools to manage their own memory:
Memory block editing
Section titled “Memory block editing”memory_insert- Insert text into a memory blockmemory_replace- Replace specific text in a memory blockmemory_rethink- Completely rewrite a memory block
Recall memory
Section titled “Recall memory”conversation_search- Search prior conversation history
Archival memory
Section titled “Archival memory”archival_memory_insert- Store facts and knowledge long-termarchival_memory_search- Query semantic storage
Learn more about memory tools →
Creating Agents
Section titled “Creating Agents”Agents are created with memory blocks that define their persistent context:
import Letta from "@letta-ai/letta-client";
const client = new Letta({ apiKey: process.env.LETTA_API_KEY });
const agent = await client.agents.create({ model: "openai/gpt-4o-mini", embedding: "openai/text-embedding-3-small", memory_blocks: [ { label: "human", value: "The human's name is Chad. They like vibe coding.", }, { label: "persona", value: "My name is Sam, the all-knowing sentient AI.", }, ], tools: ["web_search", "run_code"],});from letta_client import Lettaimport os
client = Letta(api_key=os.getenv("LETTA_API_KEY"))
agent = client.agents.create( model="openai/gpt-4o-mini", embedding="openai/text-embedding-3-small", memory_blocks=[ { "label": "human", "value": "The human's name is Chad. They like vibe coding." }, { "label": "persona", "value": "My name is Sam, the all-knowing sentient AI." } ], tools=["web_search", "run_code"])curl -X POST https://api.letta.com/v1/agents \ -H "Authorization: Bearer $LETTA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o-mini", "embedding": "openai/text-embedding-3-small", "memory_blocks": [ { "label": "human", "value": "The human'\''s name is Chad. They like vibe coding." }, { "label": "persona", "value": "My name is Sam, the all-knowing sentient AI." } ], "tools": ["web_search", "run_code"]}'Context Window Management
Section titled “Context Window Management”When the context window fills up, Letta automatically:
- Compacts older messages into a recursive summary
- Moves full message history to recall storage
- Agent can search recall with
conversation_searchtool
This happens transparently - your agent maintains continuity.
Populating Archival Memory
Section titled “Populating Archival Memory”Agents can insert memories during conversations, or you can populate archival memory programmatically:
// Insert a memory via SDKawait client.agents.passages.insert(agent.id, { content: "The user prefers TypeScript over JavaScript for type safety.", tags: ["preferences", "languages"],});
// Agent can now search this// Agent calls: archival_memory_search(query="language preferences")# Insert a memory via SDKclient.agents.passages.insert( agent_id=agent.id, content="The user prefers TypeScript over JavaScript for type safety.", tags=["preferences", "languages"])
# Agent can now search this# Agent calls: archival_memory_search(query="language preferences")Learn more about archival memory →
Research Background
Section titled “Research Background”Key concepts from the MemGPT research:
- Self-editing memory: Agents actively manage their own memory
- Memory hierarchy: In-context vs out-of-context storage
- Tool-based memory management: Agents decide what to remember
- Stateful agents: Persistent memory across all interactions
Read the MemGPT paper → Take the free course →