How it works

Reference

Understand how Letta Code is built and how to customize it

Letta Code is a lightweight CLI harness built around the Letta TypeScript SDK. It gives Letta agents (running on a server, local or remote) the ability to interact with your local development environment.

Architecture overview

flowchart TB
    subgraph Terminal["Your terminal"]
        subgraph LettaCode["Letta Code"]
            UI["CLI UI / Headless"]
            Executor["Tool Executor"]
            Permissions["Permission Manager"]
            SDK["Letta TS SDK"]
        end
    end

    Server["Letta server"]

    UI --> SDK
    Executor --> SDK
    Permissions --> SDK
    SDK <-->|Letta API| Server

Letta Code’s main job is to:

Manage the messaging lifecycle - Send message between the user and agent
Execute tools locally - Run Bash, Read, Write, Edit, etc. on your machine
Handle permissions - Prompt the user for approval before tool execution
Provide a terminal UI - Render agent state, streaming responses and tool calls

Client-side tool execution

The core mechanism that makes Letta Code work is client-side tool execution. Your Letta agent runs on an external server, but the tools it calls - like Bash, Read, and Write - execute locally on your machine.

This is done using the client-side tools feature of the Letta API:

Agent requests a tool - The agent decides to call Bash(ls -la)
Server pauses agent execution for approval - The request is sent to your terminal
Letta Code executes locally - The command runs on your machine
Result sent back - Output is returned to the agent to continue

// Simplified: How Letta Code handles tool execution
const response = await client.agents.messages.create(agentId, {
  messages: [{ role: "user", content: userInput }],
});

for (const msg of response.messages) {
  if (msg.message_type === "approval_request_message") {
    // Execute the tool locally
    const result = await executeToolLocally(msg.tool_call);

    // Send the result back to the agent
    await client.agents.messages.create(agentId, {
      messages: [{
        type: "approval",
        approvals: [{
          type: "tool",
          tool_call_id: msg.tool_call.tool_call_id,
          tool_return: result,
          status: "success",
        }],
      }],
    });
  }
}

Streaming

Letta Code uses the SDK’s streaming API to display reasoning, messages, and tool calls in real-time.

const stream = await client.agents.messages.stream(agentId, {
  messages: [{ role: "user", content: userInput }],
  stream_tokens: true,
});

for await (const chunk of stream) {
  if (chunk.message_type === "reasoning_message") {
    process.stdout.write(chunk.reasoning);
  } else if (chunk.message_type === "assistant_message") {
    process.stdout.write(chunk.content);
  }
}

Background mode streaming

Because coding agents are often long-running, Letta Code uses background mode streaming. This decouples agent execution from the client connection, allowing:

Resumable streams - If your connection drops, you can reconnect and resume from where you left off
Crash recovery - The agent continues processing on the server even if your terminal closes
Long operations - Tasks that take 10+ minutes won’t timeout

// Background mode persists the stream server-side
const stream = await client.agents.messages.stream(agentId, {
  messages: [{ role: "user", content: userInput }],
  stream_tokens: true,
  background: true, // Enable background mode
});

// Each chunk includes run_id and seq_id for resumption
let runId, lastSeqId;
for await (const chunk of stream) {
  if (chunk.run_id && chunk.seq_id) {
    runId = chunk.run_id;
    lastSeqId = chunk.seq_id;
  }
  // Process chunk...
}

// If disconnected, resume from last position
for await (const chunk of client.runs.stream(runId, {
  starting_after: lastSeqId,
})) {
  // Continue processing...
}

Conversations

Letta Code organizes message threads into conversations. A single agent can have multiple conversations running in parallel—all sharing the same memory blocks and searchable message history.

This lets you run several coding sessions simultaneously (one refactoring your API, another writing tests) with separate context windows but shared knowledge. Messages from all conversations are pooled together and searchable, so your agent can recall context from any past session.

By default, starting letta resumes the “default” conversation with your last-used agent. Use --new to start a fresh conversation for parallel sessions, --continue to resume your last session exactly where you left off, --resume to browse past conversations interactively, or /resume during a session to switch between conversations.

Your Letta Code agents are general Letta agents

An important point: agents created by Letta Code are general-purpose Letta agents. This means they’re fully accessible through:

The Letta API - Use the Python and TypeScript SDKs or any REST endpoint to modify/access/message them
The ADE - View and interact at app.letta.com
Other clients - Build your own interfaces on top of the Letta API (e.g. using Vercel AI SDK)

Letta Code simply attaches a set of coding-focused tools and prompts to your agents, and provides a terminal interface for interacting with them. The agents themselves, including their memory and conversation history, live on the Letta server.

This means you can:

Start a session in Letta Code, then continue in the ADE
Use the API to programmatically interact with your coding agent
Build custom UIs (e.g. dashboards) to view your Letta Code agent, even when you’re not at your terminal