Local file access
Letta Code agents read files directly from your filesystem — no upload step required
Use a Letta Code agent to extract text from PDFs and answer questions about document content using local file access.
This tutorial shows how to give a Letta agent a PDF and have a conversation about its contents. The agent reads the PDF directly from your local filesystem, extracts the text, and answers questions from memory.
No server-side upload required. The PDF stays on your machine, and the agent uses its tools to read and analyze it.
By the end of this guide, you’ll know how to:
Install Letta Code:
npm install -g @letta-ai/letta-codeInstall the Letta Code SDK:
npm install @letta-ai/letta-codeInstall the Python SDK:
pip install letta-clientYou’ll also need a PDF to work with. This tutorial uses the MemGPT research paper as an example:
curl -L https://arxiv.org/pdf/2310.08560 -o memgpt.pdfStart an interactive Letta Code session and ask the agent to read your PDF:
lettaThen in the conversation:
You: Read the file memgpt.pdf and summarize the main ideas.The agent will use its built-in tools to extract the text. It typically runs a command like:
uv run scripts/extract_pymupdf.py memgpt.pdf memgpt.mdThen reads the extracted markdown and gives you a summary.
The MemGPT paper introduces a system that treats LLM context windowsas a constrained memory resource, similar to how operating systemsmanage physical memory. The key ideas are:
1. Virtual context management — inspired by OS virtual memory, MemGPT pages information in and out of the LLM's context window from external storage
2. Hierarchical memory — a tiered system with main context (always visible to the LLM), recall storage (conversation history), and archival storage (persistent knowledge base)
3. Function-based control flow — the LLM uses function calls to manage its own memory, deciding when to read, write, search, or return responses to the user
They evaluate on two tasks: multi-session chat (where the agentmaintains long-term relationships with users) and document analysis(where the agent reasons over documents larger than its context window).import { createAgent } from "@letta-ai/letta-code";
const session = await createAgent({ permissionMode: "bypassPermissions",});
const response = await session.prompt( "Read the file memgpt.pdf and summarize the main ideas.");
console.log(response);With the Python SDK, the agent runs on the Letta server. The server-side agent needs access to the PDF, so you’ll need to include the file content in your message or use a custom tool for file access.
For local PDF analysis, we recommend the Letta Code CLI or Letta Code SDK approaches, which give the agent direct filesystem access.
If you’re using the Python SDK with Letta Cloud, you can pass extracted text directly:
from letta_client import Lettaimport subprocessimport os
client = Letta(api_key=os.getenv("LETTA_API_KEY"))
# Extract text locally firstresult = subprocess.run( ["uv", "run", "scripts/extract_pymupdf.py", "memgpt.pdf", "memgpt.md"], capture_output=True, text=True)
# Read the extracted textwith open("memgpt.md", "r") as f: pdf_text = f.read()
# Create an agent and send the textagent = client.agents.create( name="pdf_analyst", model="anthropic/claude-sonnet-4-5", memory_blocks=[ { "label": "persona", "value": "I am a research assistant that analyzes documents and answers questions about their content." }, ],)
response = client.agents.messages.create( agent_id=agent.id, messages=[{ "role": "user", "content": f"Here is a research paper. Summarize the main ideas:\n\n{pdf_text[:50000]}" }], streaming=False,)
for msg in response.messages: if msg.message_type == "assistant_message": print(msg.content)The agent remembers the document content within the conversation. Ask specific questions without re-uploading anything:
You: What specific problem does MemGPT solve with its virtual context management?
You: How does the memory hierarchy compare to a traditional operating system?
You: What were the evaluation results for multi-session chat?The agent draws on the extracted text it already has in context to answer each question.
const answer1 = await session.prompt( "What specific problem does MemGPT solve with its virtual context management?");console.log(answer1);
const answer2 = await session.prompt( "How does the memory hierarchy compare to a traditional operating system?");console.log(answer2);# Follow-up questions use the same agent and conversationresponse = client.agents.messages.create( agent_id=agent.id, messages=[{ "role": "user", "content": "What specific problem does MemGPT solve with its virtual context management?" }], streaming=False,)
for msg in response.messages: if msg.message_type == "assistant_message": print(msg.content)For batch processing or integration into scripts, use headless mode to analyze a PDF in a single command:
letta -p "Read memgpt.pdf and list the three most important contributions" \ --yolo \ --output-format textThis runs the full agent loop (extraction, analysis, response) and prints just the response text to stdout. Useful for piping into other tools or scripts.
For structured output:
letta -p "Read memgpt.pdf and list the three most important contributions" \ --yolo \ --output-format jsonimport { createAgent } from "@letta-ai/letta-code";
const session = await createAgent({ permissionMode: "bypassPermissions",});
// One-shot analysisconst result = await session.prompt( "Read memgpt.pdf and list the three most important contributions");
console.log(result);await session.close();# For batch processing with the Python SDK,# extract text locally and send to the agent:
papers = ["paper1.pdf", "paper2.pdf", "paper3.pdf"]
for paper in papers: subprocess.run( ["uv", "run", "scripts/extract_pymupdf.py", paper, f"{paper}.md"], capture_output=True )
with open(f"{paper}.md", "r") as f: text = f.read()
response = client.agents.messages.create( agent_id=agent.id, messages=[{ "role": "user", "content": f"Summarize this paper in 3 bullet points:\n\n{text[:50000]}" }], streaming=False, )
for msg in response.messages: if msg.message_type == "assistant_message": print(f"=== {paper} ===") print(msg.content)Letta Code agents can extract text from PDFs using several methods. The extracting-pdf-text skill provides scripts for common scenarios:
| PDF Type | Tool | Command |
|---|---|---|
| Text-based PDF | PyMuPDF | uv run scripts/extract_pymupdf.py input.pdf output.md |
| PDF with tables | pdfplumber | uv run scripts/extract_pdfplumber.py input.pdf output.md |
| Scanned/image PDF | Mistral OCR | uv run scripts/extract_mistral_ocr.py input.pdf output.md |
The agent will typically choose the right tool automatically. If you’re working with scanned documents or complex layouts, you can tell the agent to use OCR:
You: This is a scanned PDF. Use OCR to extract the text from report.pdfFor searching across many documents rather than analyzing a single PDF, use qmd:
# Index a directory of documentsqmd index ./papers/
# Search semanticallyqmd search "virtual memory management for LLMs"Your Letta Code agent can run these commands directly to search across your local document collection.
Local file access
Letta Code agents read files directly from your filesystem — no upload step required
Skill-based extraction
PDF extraction is handled by skills with multiple tool options for different document types
Conversational memory
The agent remembers document content within the conversation for follow-up questions
Headless mode
Run document analysis as a single command for scripting and batch processing