Images

Use vision models to analyse images.

Basic usage

agent = Agent(provider="openai")  # Use a vision-capable model

result = await agent.run(
    "What's in this image?",
    images=["photo.jpg"],
)

Multiple images

result = await agent.run(
    "Compare these two images",
    images=["image1.png", "image2.png"],
)

Supported formats

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
WebP (.webp)

URL images

Pass URLs directly:

result = await agent.run(
    "Describe this image",
    images=["https://example.com/photo.jpg"],
)

With tools

Combine vision with tools:

@tool
def save_description(text: str) -> str:
    """Save image description to file."""
    with open("description.txt", "w") as f:
        f.write(text)
    return "Saved!"

agent = Agent(provider="openai", tools=[save_description])

result = await agent.run(
    "Describe this image and save the description",
    images=["photo.jpg"],
)

Provider support

Provider	Vision support
OpenAI	✅ GPT-4 Vision
Anthropic	✅ Claude 3
Mistral	✅ Pixtral

How it works

Images are:

Loaded from disk or URL
Base64 encoded
Sent with the appropriate MIME type
Included in the message content

The encoding is handled automatically. Just pass file paths or URLs.

Basic usage​

Multiple images​

Supported formats​

URL images​

With tools​

Provider support​

How it works​

Basic usage

Multiple images

Supported formats

URL images

With tools

Provider support

How it works