Skip to content
Letta Platform Letta Platform Letta Docs
Sign up
Tutorials
First steps

Talk to your PDF

Use a Letta Code agent to extract text from PDFs and answer questions about document content using local file access.

This tutorial shows how to give a Letta agent a PDF and have a conversation about its contents. The agent reads the PDF directly from your local filesystem, extracts the text, and answers questions from memory.

No server-side upload required. The PDF stays on your machine, and the agent uses its tools to read and analyze it.

By the end of this guide, you’ll know how to:

  • Point an agent at a local PDF file
  • Have the agent extract and analyze the content
  • Ask follow-up questions in the same conversation
  • Use headless mode for scripted document analysis

Install Letta Code:

Terminal window
npm install -g @letta-ai/letta-code

You’ll also need a PDF to work with. This tutorial uses the MemGPT research paper as an example:

Terminal window
curl -L https://arxiv.org/pdf/2310.08560 -o memgpt.pdf

Start an interactive Letta Code session and ask the agent to read your PDF:

Terminal window
letta

Then in the conversation:

You: Read the file memgpt.pdf and summarize the main ideas.

The agent will use its built-in tools to extract the text. It typically runs a command like:

Terminal window
uv run scripts/extract_pymupdf.py memgpt.pdf memgpt.md

Then reads the extracted markdown and gives you a summary.

Example output
The MemGPT paper introduces a system that treats LLM context windows
as a constrained memory resource, similar to how operating systems
manage physical memory. The key ideas are:
1. Virtual context management — inspired by OS virtual memory, MemGPT
pages information in and out of the LLM's context window from
external storage
2. Hierarchical memory — a tiered system with main context (always
visible to the LLM), recall storage (conversation history), and
archival storage (persistent knowledge base)
3. Function-based control flow — the LLM uses function calls to manage
its own memory, deciding when to read, write, search, or return
responses to the user
They evaluate on two tasks: multi-session chat (where the agent
maintains long-term relationships with users) and document analysis
(where the agent reasons over documents larger than its context window).

The agent remembers the document content within the conversation. Ask specific questions without re-uploading anything:

You: What specific problem does MemGPT solve with its virtual context management?
You: How does the memory hierarchy compare to a traditional operating system?
You: What were the evaluation results for multi-session chat?

The agent draws on the extracted text it already has in context to answer each question.

Step 3: Headless mode for scripted analysis

Section titled “Step 3: Headless mode for scripted analysis”

For batch processing or integration into scripts, use headless mode to analyze a PDF in a single command:

Terminal window
letta -p "Read memgpt.pdf and list the three most important contributions" \
--yolo \
--output-format text

This runs the full agent loop (extraction, analysis, response) and prints just the response text to stdout. Useful for piping into other tools or scripts.

For structured output:

Terminal window
letta -p "Read memgpt.pdf and list the three most important contributions" \
--yolo \
--output-format json

Letta Code agents can extract text from PDFs using several methods. The extracting-pdf-text skill provides scripts for common scenarios:

PDF TypeToolCommand
Text-based PDFPyMuPDFuv run scripts/extract_pymupdf.py input.pdf output.md
PDF with tablespdfplumberuv run scripts/extract_pdfplumber.py input.pdf output.md
Scanned/image PDFMistral OCRuv run scripts/extract_mistral_ocr.py input.pdf output.md

The agent will typically choose the right tool automatically. If you’re working with scanned documents or complex layouts, you can tell the agent to use OCR:

You: This is a scanned PDF. Use OCR to extract the text from report.pdf

For searching across many documents rather than analyzing a single PDF, use qmd:

Terminal window
# Index a directory of documents
qmd index ./papers/
# Search semantically
qmd search "virtual memory management for LLMs"

Your Letta Code agent can run these commands directly to search across your local document collection.

Local file access

Letta Code agents read files directly from your filesystem — no upload step required

Skill-based extraction

PDF extraction is handled by skills with multiple tool options for different document types

Conversational memory

The agent remembers document content within the conversation for follow-up questions

Headless mode

Run document analysis as a single command for scripting and batch processing