Multi-modal (image inputs)

Send images to your agents

Multi-modal features require compatible language models. Ensure your agent is configured with a multi-modal capable model.

Letta agents support image inputs, enabling richer conversations and more powerful agent capabilities.

Model Support

Multi-modal capabilities depend on the underlying language model. You can check which models from the API providers support image inputs by checking their individual model pages:

  • OpenAI: GPT-4.1, o1/3/4, GPT-4o
  • Anthropic: Claude Opus 4, Claude Sonnet 4
  • Gemini: Gemini 2.5 Pro, Gemini 2.5 Flash

If the provider you’re using doesn’t support image inputs, your images will still appear in the context window, but as a text message telling the agent that an image exists.

ADE Support

You can pass images to your agents by drag-and-dropping them into the chat window, or clicking the image icon to select a manual file upload.

Usage Examples (SDK)

Sending an Image via URL

1import { LettaClient } from '@letta-ai/letta-client';
2
3const client = new LettaClient({ token: "LETTA_API_KEY" });
4
5const response = await client.agents.messages.create(
6 agentState.id, {
7 messages: [
8 {
9 role: "user",
10 content: [
11 {
12 type: "image",
13 source: {
14 type: "url",
15 url: "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg",
16 },
17 },
18 {
19 type: "text",
20 text: "Describe this image."
21 }
22 ],
23 }
24 ],
25 }
26);

Sending an Image via Base64

1import { LettaClient } from '@letta-ai/letta-client';
2
3const client = new LettaClient({ token: "LETTA_API_KEY" });
4
5const imageUrl = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg";
6const imageResponse = await fetch(imageUrl);
7const imageBuffer = await imageResponse.arrayBuffer();
8const imageData = Buffer.from(imageBuffer).toString('base64');
9
10const response = await client.agents.messages.create(
11 agentState.id, {
12 messages: [
13 {
14 role: "user",
15 content: [
16 {
17 type: "image",
18 source: {
19 type: "base64",
20 media_type: "image/jpeg",
21 data: imageData,
22 },
23 },
24 {
25 type: "text",
26 text: "Describe this image."
27 }
28 ],
29 }
30 ],
31 }
32);