Cerebras

Cerebras Inference provides high-speed, low-latency AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. Agno integrates directly with the Cerebras Python SDK, allowing you to use state-of-the-art Llama models with a simple interface.

Prerequisites

To use Cerebras with Agno, you need to:

Install the required packages:
```
uv pip install cerebras-cloud-sdk
```
Set your API key: The Cerebras SDK expects your API key to be available as an environment variable:
```
export CEREBRAS_API_KEY=your_api_key_here
```

Basic Usage

Here’s how to use a Cerebras model with Agno:

from agno.agent import Agent
from agno.models.cerebras import Cerebras

agent = Agent(
    model=Cerebras(id="llama-4-scout-17b-16e-instruct"),
    markdown=True,
)

# Print the response in the terminal
agent.print_response("write a two sentence horror story")

Supported Models

Cerebras currently supports the following models (see docs for the latest list):

Model Name	Model ID	Parameters	Knowledge
Llama 4 Scout	llama-4-scout-17b-16e-instruct	109 billion	August 2024
Llama 3.1 8B	llama3.1-8b	8 billion	March 2023
Llama 3.3 70B	llama-3.3-70b	70 billion	December 2023
DeepSeek R1 Distill Llama 70B*	deepseek-r1-distill-llama-70b	70 billion	December 2023

* DeepSeek R1 Distill Llama 70B is available in private preview.

Parameters

Parameter	Type	Default	Description
`id`	`str`	`"llama-4-scout-17b-16e-instruct"`	The id of the Cerebras model to use
`name`	`str`	`"Cerebras"`	The name of the model
`provider`	`str`	`"Cerebras"`	The provider of the model
`parallel_tool_calls`	`Optional[bool]`	`None`	Whether to run tool calls in parallel (automatically set to False for llama-4-scout)
`max_completion_tokens`	`Optional[int]`	`None`	Maximum number of completion tokens to generate
`repetition_penalty`	`Optional[float]`	`None`	Penalty for repeating tokens (higher values reduce repetition)
`temperature`	`Optional[float]`	`None`	Controls randomness in the model’s output (0.0 to 2.0)
`top_p`	`Optional[float]`	`None`	Controls diversity via nucleus sampling (0.0 to 1.0)
`top_k`	`Optional[int]`	`None`	Controls diversity via top-k sampling
`strict_output`	`bool`	`True`	Controls schema adherence for structured outputs
`extra_headers`	`Optional[Any]`	`None`	Additional headers to include in requests
`extra_query`	`Optional[Any]`	`None`	Additional query parameters to include in requests
`extra_body`	`Optional[Any]`	`None`	Additional body parameters to include in requests
`request_params`	`Optional[Dict[str, Any]]`	`None`	Additional parameters to include in the request
`api_key`	`Optional[str]`	`None`	The API key for authenticating with Cerebras (defaults to CEREBRAS_API_KEY env var)
`base_url`	`Optional[Union[str, httpx.URL]]`	`None`	The base URL for the Cerebras API
`timeout`	`Optional[float]`	`None`	Request timeout in seconds
`max_retries`	`Optional[int]`	`None`	Maximum number of retries for failed requests
`default_headers`	`Optional[Any]`	`None`	Default headers to include in all requests
`default_query`	`Optional[Any]`	`None`	Default query parameters to include in all requests
`http_client`	`Optional[httpx.Client]`	`None`	HTTP client instance for making requests
`client_params`	`Optional[Dict[str, Any]]`	`None`	Additional parameters for client configuration
`client`	`Optional[CerebrasClient]`	`None`	A pre-configured instance of the Cerebras client
`async_client`	`Optional[AsyncCerebrasClient]`	`None`	A pre-configured instance of the async Cerebras client

Cerebras is a subclass of the Model class and has access to the same params.

Structured Outputs

The Cerebras model supports structured outputs using JSON schema:

from agno.agent import Agent
from agno.models.cerebras import Cerebras
from pydantic import BaseModel
from typing import List

class MovieScript(BaseModel):
    setting: str
    characters: List[str]
    plot: str

agent = Agent(
    model=Cerebras(id="llama-4-scout-17b-16e-instruct"),
    response_format=MovieScript,
)

Resources

SDK Examples

View more examples here.

Get Started

Basics

Advanced

Production

Providers

Other

Additional Resources

Prerequisites

Basic Usage

Supported Models

Parameters

Structured Outputs

Resources

SDK Examples

Get Started

Basics

Advanced

Production

Providers

Other

Additional Resources

​Prerequisites

​Basic Usage

​Supported Models

​Parameters

​Structured Outputs

​Resources

​SDK Examples

Prerequisites

Basic Usage

Supported Models

Parameters

Structured Outputs

Resources

SDK Examples