Multi-modal (image inputs)

Multi-modal features require compatible language models. Ensure your agent is configured with a multi-modal capable model.

Letta agents support image inputs, enabling richer conversations and more powerful agent capabilities.

Model Support

Multi-modal capabilities depend on the underlying language model. You can check which models from the API providers support image inputs by checking their individual model pages:

OpenAI: GPT-4.1, o1/3/4, GPT-4o
Anthropic: Claude Opus 4, Claude Sonnet 4
Gemini: Gemini 2.5 Pro, Gemini 2.5 Flash

If the provider you’re using doesn’t support image inputs, your images will still appear in the context window, but as a text message telling the agent that an image exists.

ADE Support

You can pass images to your agents by drag-and-dropping them into the chat window, or clicking the image icon to select a manual file upload.

Usage Examples (SDK)

Sending an Image via URL

1 from letta_client import Letta
2 
3 client = Letta(token="LETTA_API_KEY")
4 
5 response = client.agents.messages.create(
6     agent_id=agent_state.id,
7     messages=[
8         {
9             "role": "user",
10             "content": [
11                 {
12                     "type": "image",
13                     "source": {
14                         "type": "url",
15                         "url": "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg",
16                     },
17                 },
18                 {
19                     "type": "text",
20                     "text": "Describe this image."
21                 }
22             ],
23         }
24     ],
25 )

Sending an Image via Base64

1 import base64
2 import httpx
3 from letta_client import Letta
4 
5 client = Letta(token="LETTA_API_KEY")
6 
7 image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
8 image_data = base64.standard_b64encode(httpx.get(image_url).content).decode("utf-8")
9 
10 response = client.agents.messages.create(
11     agent_id=agent_state.id,
12     messages=[
13         {
14             "role": "user",
15             "content": [
16                 {
17                     "type": "image",
18                     "source": {
19                         "type": "base64",
20                         "media_type": "image/jpeg",
21                         "data": image_data,
22                     },
23                 },
24                 {
25                     "type": "text",
26                     "text": "Describe this image."
27                 }
28             ],
29         }
30     ],
31 )