Message Format
This page documents the Python message and response schema used by the Agent SDK. It mirrors the structure shown in Chat History and provides precise type definitions you can target in your own code.
All examples below use Python type hints with TypedDict
and Literal
from the standard typing
module.
Response
The agent yields response chunks as an async generator of objects with output
and usage
.
from typing import List, TypedDict
class Usage(TypedDict, total=False):
prompt_tokens: int
completion_tokens: int
total_tokens: int
response_cost: float # USD cost if available
class AgentResponse(TypedDict):
output: List["AgentMessage"]
usage: Usage
Messages
Agent messages represent the state of the conversation and the agent's actions.
from typing import List, Literal, Optional, TypedDict, Union
# Union of all message variants
AgentMessage = Union[
"UserMessage",
"AssistantMessage",
"ReasoningMessage",
"ComputerCallMessage",
"ComputerCallOutputMessage",
"FunctionCallMessage",
"FunctionCallOutputMessage",
]
# Input message (role: user/system/developer)
class UserMessage(TypedDict, total=False):
type: Literal["message"] # optional for user input
role: Literal["user", "system", "developer"]
content: Union[str, List["InputContent"]]
# Output message (assistant text)
class AssistantMessage(TypedDict):
type: Literal["message"]
role: Literal["assistant"]
content: List["OutputContent"]
# Output reasoning/thinking message
class ReasoningMessage(TypedDict):
type: Literal["reasoning"]
summary: List["SummaryContent"]
# Output computer action call (agent intends to act)
class ComputerCallMessage(TypedDict):
type: Literal["computer_call"]
call_id: str
status: Literal["completed", "failed", "pending"]
action: "ComputerAction"
# Output computer action result (always a screenshot)
class ComputerCallOutputMessage(TypedDict):
type: Literal["computer_call_output"]
call_id: str
output: "ComputerResultContent"
# Output function call (agent calls a Python tool)
class FunctionCallMessage(TypedDict):
type: Literal["function_call"]
call_id: str
status: Literal["completed", "failed", "pending"]
name: str
arguments: str # JSON-serialized kwargs
# Output function call result (text)
class FunctionCallOutputMessage(TypedDict):
type: Literal["function_call_output"]
call_id: str
output: str
Message Content
These content items appear inside content
arrays for the message types above.
# Input content kinds
class InputContent(TypedDict):
type: Literal["input_image", "input_text"]
text: Optional[str]
image_url: Optional[str] # e.g., data URL
# Assistant output content
class OutputContent(TypedDict):
type: Literal["output_text"]
text: str
# Reasoning/summary output content
class SummaryContent(TypedDict):
type: Literal["summary_text"]
text: str
# Computer call outputs (screenshots)
class ComputerResultContent(TypedDict):
type: Literal["computer_screenshot", "input_image"]
image_url: str # data URL (e.g., "data:image/png;base64,....")
Actions
Computer actions represent concrete operations the agent will perform on the computer.
Two broad families exist depending on the provider: OpenAI-style and Anthropic-style.
# Union of all supported computer actions
ComputerAction = Union[
"ClickAction",
"DoubleClickAction",
"DragAction",
"KeyPressAction",
"MoveAction",
"ScreenshotAction",
"ScrollAction",
"TypeAction",
"WaitAction",
# Anthropic variants
"LeftMouseDownAction",
"LeftMouseUpAction",
]
# OpenAI Computer Actions
class ClickAction(TypedDict):
type: Literal["click"]
button: Literal["left", "right", "wheel", "back", "forward"]
x: int
y: int
class DoubleClickAction(TypedDict, total=False):
type: Literal["double_click"]
button: Literal["left", "right", "wheel", "back", "forward"]
x: int
y: int
class DragAction(TypedDict, total=False):
type: Literal["drag"]
button: Literal["left", "right", "wheel", "back", "forward"]
path: List[tuple[int, int]] # [(x1, y1), (x2, y2), ...]
class KeyPressAction(TypedDict):
type: Literal["keypress"]
keys: List[str] # e.g., ["ctrl", "a"]
class MoveAction(TypedDict):
type: Literal["move"]
x: int
y: int
class ScreenshotAction(TypedDict):
type: Literal["screenshot"]
class ScrollAction(TypedDict):
type: Literal["scroll"]
scroll_x: int
scroll_y: int
x: int
y: int
class TypeAction(TypedDict):
type: Literal["type"]
text: str
class WaitAction(TypedDict):
type: Literal["wait"]
# Anthropic Computer Actions
class LeftMouseDownAction(TypedDict):
type: Literal["left_mouse_down"]
x: int
y: int
class LeftMouseUpAction(TypedDict):
type: Literal["left_mouse_up"]
x: int
y: int
Notes
- The agent runtime may add provider-specific fields when available (e.g., usage cost). Unknown fields should be ignored for forward compatibility.
- Computer action outputs are screenshots as data URLs. For security and storage, some serializers may redact or omit large fields in persisted metadata.
- The message flow typically alternates between reasoning, actions, screenshots, and concluding assistant text. See Chat History for a step-by-step example.