Create Message Streaming
Process a user message and return the agent’s response.
Deprecated: Use the POST /{agent_id}/messages endpoint with streaming=true in the request body instead.
Note: Sending multiple concurrent requests to the same agent can lead to undefined behavior. Each agent processes messages sequentially, and concurrent requests may interleave in unexpected ways. Wait for each request to complete before sending the next one. Use separate agents or conversations for parallel processing.
This endpoint accepts a message from a user and processes it through the agent. It will stream the steps of the response always, and stream the tokens if ‘stream_tokens’ is set to True.
ParametersExpand Collapse
Deprecatedassistant_message_tool_kwarg: Optional[str]
The name of the message argument in the designated message tool. Still supported for legacy agent types, but deprecated for letta_v1_agent onward.
Deprecatedassistant_message_tool_name: Optional[str]
The name of the designated message tool. Still supported for legacy agent types, but deprecated for letta_v1_agent onward.
client_skills: Optional[Iterable[ClientSkill]]
Client-side skills available in the environment. These are rendered in the system prompt’s available skills section alongside agent-scoped skills from MemFS.
client_tools: Optional[Iterable[ClientTool]]
Client-side tools that the agent can call. When the agent calls a client-side tool, execution pauses and returns control to the client to execute the tool and provide the result via a ToolReturn.
Deprecatedenable_thinking: Optional[str]
If set to True, enables reasoning before responses or tool calls from the agent.
If True, compaction events emit structured SummaryMessage and EventMessage types. If False (default), compaction messages are not included in the response.
Whether to include periodic keepalive ping messages in the stream to prevent connection timeouts (only used when streaming=true).
input: Optional[Union[str, Iterable[InputUnionMember1], null]]
Syntactic sugar for a single user message. Equivalent to messages=[{‘role’: ‘user’, ‘content’: input}].
Iterable[InputUnionMember1]
class ImageContent: …
class ToolCallContent: …
class ReasoningContent: …
messages: Optional[Iterable[Message]]
The messages to be sent to the agent.
class MessageCreate: …
Request to create a message
The content of the message.
List[LettaMessageContentUnion]
class ImageContent: …
class ToolCallContent: …
class ReasoningContent: …
class ApprovalCreate: …
Input to approve or deny a tool call request
approvals: Optional[List[Approval]]
The list of approval responses
class ToolReturn: …
tool_return: Union[List[ToolReturnUnionMember0], str]
The tool return value - either a string or list of content parts (text/image)
List[ToolReturnUnionMember0]
class MessageToolReturnCreate: …
Submit tool return(s) from client-side tool execution.
This is the preferred way to send tool results back to the agent after client-side tool execution. It is equivalent to sending an ApprovalCreate with tool return approvals, but provides a cleaner API for the common case.
List of tool returns from client-side execution
tool_return: Union[List[ToolReturnUnionMember0], str]
The tool return value - either a string or list of content parts (text/image)
List[ToolReturnUnionMember0]
override_model: Optional[str]
Model handle to use for this request instead of the agent’s default model. This allows sending a message to a different model without changing the agent’s configuration.
override_system: Optional[str]
Optional per-request system prompt override. When set, this is passed directly to the underlying LLM request and bypasses the persisted/compiled system message for that request.
If True, returns log probabilities of the output tokens in the response. Useful for RL training. Only supported for OpenAI-compatible providers (including SGLang).
If True, returns token IDs and logprobs for ALL LLM generations in the agent step, not just the last one. Uses SGLang native /generate endpoint. Returns ‘turns’ field with TurnTokenData for each assistant/tool turn. Required for proper multi-turn RL training with loss masking.
Flag to determine if individual tokens should be streamed, rather than streaming per step (only used when streaming=true).
If True, returns a streaming response (Server-Sent Events). If False (default), returns a complete response.
top_logprobs: Optional[int]
Number of most likely tokens to return at each position (0-20). Requires return_logprobs=True.
ReturnsExpand Collapse
Streaming response type for Server-Sent Events (SSE) endpoints. Each event in the stream will be one of these types.
class SystemMessage: …
A message generated by the system. Never streamed back on a response, only used for cursor pagination.
Args: id (str): The ID of the message date (datetime): The date the message was created in ISO format name (Optional[str]): The name of the sender of the message content (str): The message content sent by the system
class UserMessage: …
A message sent by the user. Never streamed back on a response, only used for cursor pagination.
Args: id (str): The ID of the message date (datetime): The date the message was created in ISO format name (Optional[str]): The name of the sender of the message content (Union[str, List[LettaUserMessageContentUnion]]): The message content sent by the user (can be a string or an array of multi-modal content parts)
The message content sent by the user (can be a string or an array of multi-modal content parts)
class ReasoningMessage: …
Representation of an agent’s internal reasoning.
Args: id (str): The ID of the message date (datetime): The date the message was created in ISO format name (Optional[str]): The name of the sender of the message source (Literal[“reasoner_model”, “non_reasoner_model”]): Whether the reasoning content was generated natively by a reasoner model or derived via prompting reasoning (str): The internal reasoning of the agent signature (Optional[str]): The model-generated signature of the reasoning step
otid: Optional[str]
The offline threading id (OTID). Set by the client to deduplicate requests. Used for idempotency in background streaming mode — each message in a request must have a unique OTID. Retries of the same request should reuse the same OTIDs.
class HiddenReasoningMessage: …
Representation of an agent’s internal reasoning where reasoning content has been hidden from the response.
Args: id (str): The ID of the message date (datetime): The date the message was created in ISO format name (Optional[str]): The name of the sender of the message state (Literal[“redacted”, “omitted”]): Whether the reasoning content was redacted by the provider or simply omitted by the API hidden_reasoning (Optional[str]): The internal reasoning of the agent
class ToolCallMessage: …
A message representing a request to call a tool (generated by the LLM to trigger tool execution).
Args: id (str): The ID of the message date (datetime): The date the message was created in ISO format name (Optional[str]): The name of the sender of the message tool_call (Union[ToolCall, ToolCallDelta]): The tool call
otid: Optional[str]
The offline threading id (OTID). Set by the client to deduplicate requests. Used for idempotency in background streaming mode — each message in a request must have a unique OTID. Retries of the same request should reuse the same OTIDs.
tool_calls: Optional[ToolCalls]
List[ToolCall]
class ToolReturnMessage: …
A message representing the return value of a tool call (generated by Letta executing the requested tool).
Args: id (str): The ID of the message date (datetime): The date the message was created in ISO format name (Optional[str]): The name of the sender of the message tool_return (str): The return value of the tool (deprecated, use tool_returns) status (Literal[“success”, “error”]): The status of the tool call (deprecated, use tool_returns) tool_call_id (str): A unique identifier for the tool call that generated this message (deprecated, use tool_returns) stdout (Optional[List(str)]): Captured stdout (e.g. prints, logs) from the tool invocation (deprecated, use tool_returns) stderr (Optional[List(str)]): Captured stderr from the tool invocation (deprecated, use tool_returns) tool_returns (Optional[List[ToolReturn]]): List of tool returns for multi-tool support
otid: Optional[str]
The offline threading id (OTID). Set by the client to deduplicate requests. Used for idempotency in background streaming mode — each message in a request must have a unique OTID. Retries of the same request should reuse the same OTIDs.
tool_return: Union[List[ToolReturnUnionMember0], str]
The tool return value - either a string or list of content parts (text/image)
List[ToolReturnUnionMember0]
class AssistantMessage: …
A message sent by the LLM in response to user input. Used in the LLM context.
Args: id (str): The ID of the message date (datetime): The date the message was created in ISO format name (Optional[str]): The name of the sender of the message content (Union[str, List[LettaAssistantMessageContentUnion]]): The message content sent by the agent (can be a string or an array of content parts)
class ApprovalRequestMessage: …
A message representing a request for approval to call a tool (generated by the LLM to trigger tool execution).
Args: id (str): The ID of the message date (datetime): The date the message was created in ISO format name (Optional[str]): The name of the sender of the message tool_call (ToolCall): The tool call
otid: Optional[str]
The offline threading id (OTID). Set by the client to deduplicate requests. Used for idempotency in background streaming mode — each message in a request must have a unique OTID. Retries of the same request should reuse the same OTIDs.
tool_calls: Optional[ToolCalls]
The tool calls that have been requested by the llm to run, which are pending approval
List[ToolCall]
class ApprovalResponseMessage: …
A message representing a response form the user indicating whether a tool has been approved to run.
Args: id (str): The ID of the message date (datetime): The date the message was created in ISO format name (Optional[str]): The name of the sender of the message approve: (bool) Whether the tool has been approved approval_request_id: The ID of the approval request reason: (Optional[str]) An optional explanation for the provided approval status
approvals: Optional[List[Approval]]
The list of approval responses
class ToolReturn: …
tool_return: Union[List[ToolReturnUnionMember0], str]
The tool return value - either a string or list of content parts (text/image)
List[ToolReturnUnionMember0]
class LettaPing: …
A ping message used as a keepalive to prevent SSE streams from timing out during long running requests.
Args: id (str): The ID of the message date (datetime): The date the message was created in ISO format
message_type: Optional[Literal["ping"]]
The type of the message. Ping messages are a keep-alive to prevent SSE streams from timing out during long running requests.
class LettaErrorMessage: …
Error messages are used to notify the client of an error that occurred during the agent’s execution.
class LettaUsageStatistics: …
Usage statistics for the agent interaction.
Attributes: completion_tokens (int): The number of tokens generated by the agent. prompt_tokens (int): The number of tokens in the prompt. total_tokens (int): The total number of tokens processed by the agent. step_count (int): The number of steps taken by the agent. cached_input_tokens (Optional[int]): The number of input tokens served from cache. None if not reported. cache_write_tokens (Optional[int]): The number of input tokens written to cache. None if not reported. reasoning_tokens (Optional[int]): The number of reasoning/thinking tokens generated. None if not reported.
cache_write_tokens: Optional[int]
The number of input tokens written to cache (Anthropic only). None if not reported by provider.
cached_input_tokens: Optional[int]
The number of input tokens served from cache. None if not reported by provider.
Create Message Streaming
import os
from letta_client import Letta
client = Letta(
api_key=os.environ.get("LETTA_API_KEY"), # This is the default and can be omitted
)
for message in client.agents.messages.stream(
agent_id="agent-123e4567-e89b-42d3-8456-426614174000",
):
print(message){
"id": "id",
"content": "content",
"date": "2019-12-27T18:11:19.117Z",
"is_err": true,
"message_type": "system_message",
"name": "name",
"otid": "otid",
"run_id": "run_id",
"sender_id": "sender_id",
"seq_id": 0,
"step_id": "step_id"
}Returns Examples
{
"id": "id",
"content": "content",
"date": "2019-12-27T18:11:19.117Z",
"is_err": true,
"message_type": "system_message",
"name": "name",
"otid": "otid",
"run_id": "run_id",
"sender_id": "sender_id",
"seq_id": 0,
"step_id": "step_id"
}
Skip to content