Core Concepts

Understanding what makes Letta different

The Fundamental Limitation of LLMs

Large language models are stateless by design. An LLM’s knowledge comes from two sources:

  1. Model weights - Fixed after training
  2. Context window - Ephemeral input provided at inference time

This means LLMs have no persistent memory between interactions. Each API call starts from scratch, with no ability to learn from past experiences or maintain state across sessions.

What are Stateful Agents?

Stateful agents overcome this limitation by maintaining persistent memory and identity across all interactions.

A stateful agent has:

  • Persistent identity - Exists as a unique entity with continuity across sessions
  • Active memory formation - Autonomously decides what information to store and update
  • Accumulated state - Learns through experience rather than just model weights
  • Long-term context - Maintains knowledge beyond single conversation windows

Unlike traditional LLM applications where your code manages state, stateful agents actively manage their own memory using built-in tools to read, write, and search their persistent storage.

Why Statefulness Matters

Traditional LLM applications are stateless - every interaction starts from scratch. Your application must:

  • Store all conversation history in your own database
  • Send the entire context with every API call
  • Implement memory and personalization logic yourself
  • Manually manage context window limits

With Letta’s stateful agents, all of this is handled for you. The agent maintains its own persistent state, intelligently manages its context window, and learns from every interaction without requiring you to build a complex state management layer.

Stateful vs Stateless APIs

The difference between stateful agents and traditional LLM APIs is fundamental:

Traditional APIs (stateless): No memory between requests. Your app manages everything.

Letta (stateful): Agents maintain their own persistent state. You only send new messages.

Traditional Stateless API

With stateless APIs, there is no state persistence between requests. The client must send the entire conversation history with every call.

The client must send the full conversation history with each request:

  • Request 2: [msg1, response1, msg2]
  • Request 3: [msg1, response1, msg2, response2, msg3]

Letta Stateful API

Letta maintains agent state on the server and persists it to a database. Clients only send new messages, and the server handles all state management.

The client only sends new messages:

  • Request 2: [msg2]
  • Request 3: [msg3]

Key Differences

AspectTraditional (Stateless)Letta (Stateful)
State managementClient-sideServer-side
Request formatSend full conversation historySend only new messages
MemoryNone (ephemeral)Persistent database
Context limitHard limit, then failsIntelligent management
Agent identityNoneEach agent has unique ID
Long conversationsExpensive & brittleScales infinitely
PersonalizationApp must manageBuilt-in memory blocks
Multi-sessionRequires external DBNative support

Code Comparison

Stateless API (e.g., OpenAI):

1# You must send the entire conversation every time
2messages = [
3 {"role": "user", "content": "Hello, I'm Sarah"},
4 {"role": "assistant", "content": "Hi Sarah!"},
5 {"role": "user", "content": "What's my name?"}, # ← New message
6]
7
8# Send everything
9response = openai.chat.completions.create(
10 model="gpt-4",
11 messages=messages # ← Full history required
12)
13
14# You must store and manage messages yourself
15messages.append(response.choices[0].message)

Stateful API (Letta):

1# Agent already knows context
2response = client.agents.messages.create(
3 agent_id=agent.id,
4 messages=[
5 {"role": "user", "content": "What's my name?"} # ← New message only
6 ]
7)
8
9# Agent remembers Sarah from its memory blocks
10# No need to send previous messages

Agents as Services

Letta treats agents as persistent services, not ephemeral library calls.

In traditional frameworks, agents are objects that live in your application’s memory and disappear when your app stops. In Letta, agents are independent services that:

  • Continue to exist when your application isn’t running
  • Maintain state in a database
  • Can be accessed from multiple applications simultaneously
  • Run autonomously on the server

You interact with Letta agents through REST APIs:

POST /agents/{agent_id}/messages

This architecture enables:

  • Multi-user applications - Each user gets their own persistent agent
  • Agent-to-agent communication - Agents can message each other
  • Background processing - Agents can continue working while your app is offline
  • Deployment flexibility - Scale agents independently from your application

Persistence by Default

In Letta, all state is persisted automatically:

  • Agent memory (both memory blocks and archival)
  • Message history
  • Tool configurations
  • Agent state and context

Because everything is persisted:

  • Agents can be paused and resumed at any time
  • You can reload agents across different machines
  • State is never lost due to application restarts
  • Long conversations don’t degrade performance

Self-Editing Memory

Unlike RAG systems that passively retrieve documents, Letta agents actively manage their own memory. Agents use built-in tools to:

  • Edit their memory blocks when learning new information
  • Insert facts into archival memory for long-term storage
  • Search their past conversations when context is needed

This enables agents to:

  • Learn user preferences over time
  • Maintain consistent personality across sessions
  • Build long-term relationships with users
  • Continuously improve from interactions

Learn more about memory →

Agents vs Threads

Letta doesn’t have the concept of threads or sessions. Instead, there are only stateful agents with a single perpetual message history.

Why no threads? Letta is built on the principle that all interactions should be part of persistent memory, not ephemeral sessions. This enables:

  • Continuous learning across all conversations
  • True long-term memory and relationships
  • No context loss when “starting a new thread”

For multi-user applications, we recommend creating one agent per user. Each agent maintains its own persistent memory about that specific user.

If you need conversation templates or starting points, use agent templates to create new agents with pre-configured state.

LLM OS

The LLM Operating System is the infrastructure layer that manages agent execution, state, and memory. This includes:

  • Agent runtime - Manages tool execution and the reasoning loop
  • Memory layer - Handles context window management and persistence
  • Stateful layer - Coordinates state across database, cache, and execution

Letta’s architecture is inspired by the MemGPT research paper, which introduced these concepts.

Beyond Model Size

The path to more capable AI systems isn’t just about larger models or longer context windows. Stateful agents represent a fundamental shift: agents that learn through accumulated experience, build lasting relationships with users, and continuously improve without retraining.

With stateful agents, you can build:

  • Personalized assistants that adapt to individual users over time
  • Learning systems that improve from feedback and interactions
  • Long-term relationships where agents develop deep context about users and tasks
  • Autonomous services that operate independently and maintain their own knowledge

This architectural shift—from stateless function calls to stateful agent services—enables a new class of AI applications that weren’t possible with traditional LLM APIs.

Next Steps