Multi-modal (image inputs)

Send images to your agents

Multi-modal features require compatible language models. Ensure your agent is configured with a multi-modal capable model.

Letta agents support image inputs, enabling richer conversations and more powerful agent capabilities.

Model Support

Multi-modal capabilities depend on the underlying language model. You can check which models from the API providers support image inputs by checking their individual model pages:

  • OpenAI: GPT-4.1, o1/3/4, GPT-4o
  • Anthropic: Claude Opus 4, Claude Sonnet 4
  • Gemini: Gemini 2.5 Pro, Gemini 2.5 Flash

If the provider you’re using doesn’t support image inputs, your images will still appear in the context window, but as a text message telling the agent that an image exists.

ADE Support

You can pass images to your agents by drag-and-dropping them into the chat window, or clicking the image icon to select a manual file upload.

Usage Examples (SDK)

Sending an Image via URL

1from letta_client import Letta
2
3client = Letta(token="LETTA_API_KEY")
4
5response = client.agents.messages.create(
6 agent_id=agent_state.id,
7 messages=[
8 {
9 "role": "user",
10 "content": [
11 {
12 "type": "image",
13 "source": {
14 "type": "url",
15 "url": "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg",
16 },
17 },
18 {
19 "type": "text",
20 "text": "Describe this image."
21 }
22 ],
23 }
24 ],
25)

Sending an Image via Base64

1import base64
2import httpx
3from letta_client import Letta
4
5client = Letta(token="LETTA_API_KEY")
6
7image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
8image_data = base64.standard_b64encode(httpx.get(image_url).content).decode("utf-8")
9
10response = client.agents.messages.create(
11 agent_id=agent_state.id,
12 messages=[
13 {
14 "role": "user",
15 "content": [
16 {
17 "type": "image",
18 "source": {
19 "type": "base64",
20 "media_type": "image/jpeg",
21 "data": image_data,
22 },
23 },
24 {
25 "type": "text",
26 "text": "Describe this image."
27 }
28 ],
29 }
30 ],
31)