Skip to content
Letta Platform Letta Platform Letta Docs
Sign up
Core concepts
Messages

Image inputs

Pass image data to your agents in the Letta API

The Letta API supports image inputs, enabling LLMs that have vision capabilities to see images natively.

Multi-modal capabilities depend on the underlying language model. You can check which models from the API providers support image inputs by checking their individual model pages:

  • OpenAI: Most models beyond GPT-4.1, o1/3/4, GPT-4o
  • Anthropic: Most models beyond Claude Opus 4, Claude Sonnet 4
  • Gemini: Most models beyond Gemini 2.5 Pro, Gemini 2.5 Flash

If the provider you’re using doesn’t support image inputs, your images will still appear in the context window, but as a text message telling the agent that an image exists.

import Letta from "@letta-ai/letta-client";
const client = new Letta({ apiKey: process.env.LETTA_API_KEY });
const response = await client.agents.messages.create(agentState.id, {
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Describe this image.",
},
{
type: "image",
source: {
type: "url",
url: "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg",
},
},
],
},
],
});
import Letta from "@letta-ai/letta-client";
const client = new Letta({ apiKey: process.env.LETTA_API_KEY });
const imageUrl =
"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg";
const imageResponse = await fetch(imageUrl);
const imageBuffer = await imageResponse.arrayBuffer();
const imageData = Buffer.from(imageBuffer).toString("base64");
const response = await client.agents.messages.create(agentState.id, {
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Describe this image.",
},
{
type: "image",
source: {
type: "base64",
mediaType: "image/jpeg",
data: imageData,
},
},
],
},
],
});