How it works
Understand how Letta Code is built and how to customize it
Letta Code is a lightweight CLI harness built around the Letta TypeScript SDK. It gives Letta agents (running on a server, local or remote) the ability to interact with your local development environment.
Architecture overview
Section titled “Architecture overview”flowchart TB
subgraph Terminal["Your terminal"]
subgraph LettaCode["Letta Code"]
UI["CLI UI / Headless"]
Executor["Tool Executor"]
Permissions["Permission Manager"]
SDK["Letta TS SDK"]
end
end
Server["Letta server"]
UI --> SDK
Executor --> SDK
Permissions --> SDK
SDK <-->|Letta API| Server
Letta Code’s main job is to:
- Manage the messaging lifecycle - Send message between the user and agent
- Execute tools locally - Run Bash, Read, Write, Edit, etc. on your machine
- Handle permissions - Prompt the user for approval before tool execution
- Provide a terminal UI - Render agent state, streaming responses and tool calls
Client-side tool execution
Section titled “Client-side tool execution”The core mechanism that makes Letta Code work is client-side tool execution. Your Letta agent runs on an external server, but the tools it calls - like Bash, Read, and Write - execute locally on your machine.
This is done using the client-side tools feature of the Letta API:
- Agent requests a tool - The agent decides to call
Bash(ls -la) - Server pauses agent execution for approval - The request is sent to your terminal
- Letta Code executes locally - The command runs on your machine
- Result sent back - Output is returned to the agent to continue
// Simplified: How Letta Code handles tool executionconst response = await client.agents.messages.create(agentId, { messages: [{ role: "user", content: userInput }],});
for (const msg of response.messages) { if (msg.message_type === "approval_request_message") { // Execute the tool locally const result = await executeToolLocally(msg.tool_call);
// Send the result back to the agent await client.agents.messages.create(agentId, { messages: [{ type: "approval", approvals: [{ type: "tool", tool_call_id: msg.tool_call.tool_call_id, tool_return: result, status: "success", }], }], }); }}Streaming
Section titled “Streaming”Letta Code uses the SDK’s streaming API to display reasoning, messages, and tool calls in real-time.
const stream = await client.agents.messages.stream(agentId, { messages: [{ role: "user", content: userInput }], stream_tokens: true,});
for await (const chunk of stream) { if (chunk.message_type === "reasoning_message") { process.stdout.write(chunk.reasoning); } else if (chunk.message_type === "assistant_message") { process.stdout.write(chunk.content); }}Background mode streaming
Section titled “Background mode streaming”Because coding agents are often long-running, Letta Code uses background mode streaming. This decouples agent execution from the client connection, allowing:
- Resumable streams - If your connection drops, you can reconnect and resume from where you left off
- Crash recovery - The agent continues processing on the server even if your terminal closes
- Long operations - Tasks that take 10+ minutes won’t timeout
// Background mode persists the stream server-sideconst stream = await client.agents.messages.stream(agentId, { messages: [{ role: "user", content: userInput }], stream_tokens: true, background: true, // Enable background mode});
// Each chunk includes run_id and seq_id for resumptionlet runId, lastSeqId;for await (const chunk of stream) { if (chunk.run_id && chunk.seq_id) { runId = chunk.run_id; lastSeqId = chunk.seq_id; } // Process chunk...}
// If disconnected, resume from last positionfor await (const chunk of client.runs.stream(runId, { starting_after: lastSeqId,})) { // Continue processing...}Conversations
Section titled “Conversations”Letta Code organizes message threads into conversations. A single agent can have multiple conversations running in parallel—all sharing the same memory blocks and searchable message history.
This lets you run several coding sessions simultaneously (one refactoring your API, another writing tests) with separate context windows but shared knowledge. Messages from all conversations are pooled together and searchable, so your agent can recall context from any past session.
By default, starting letta resumes the “default” conversation with your last-used agent. Use --new to start a fresh conversation for parallel sessions, --continue to resume your last session exactly where you left off, --resume to browse past conversations interactively, or /resume during a session to switch between conversations.
Your Letta Code agents are general Letta agents
Section titled “Your Letta Code agents are general Letta agents”An important point: agents created by Letta Code are general-purpose Letta agents. This means they’re fully accessible through:
- The Letta API - Use the Python and TypeScript SDKs or any REST endpoint to modify/access/message them
- The ADE - View and interact at app.letta.com
- Other clients - Build your own interfaces on top of the Letta API (e.g. using Vercel AI SDK)
Letta Code simply attaches a set of coding-focused tools and prompts to your agents, and provides a terminal interface for interacting with them. The agents themselves, including their memory and conversation history, live on the Letta server.
This means you can:
- Start a session in Letta Code, then continue in the ADE
- Use the API to programmatically interact with your coding agent
- Build custom UIs (e.g. dashboards) to view your Letta Code agent, even when you’re not at your terminal