OLLAMA_API_KEY and start using powerful models instantly.
Key Features
- Dual Deployment Options: Choose between local hosting for privacy and control, or cloud hosting for scalability
- Seamless Switching: Easy transition between local and cloud deployments with minimal code changes
- Auto-configuration: When using an API key, the host automatically defaults to Ollama Cloud
- Wide Model Support: Access to extensive library of open-source models including GPT-OSS, Llama, Qwen, DeepSeek, and Phi models
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
id | str | "llama3.2" | The name of the Ollama model to use |
name | str | "Ollama" | The name of the model |
provider | str | "Ollama" | The provider of the model |
host | str | "http://localhost:11434" | The host URL for the Ollama server |
timeout | Optional[int] | None | Request timeout in seconds |
format | Optional[str] | None | The format to return the response in (e.g., “json”) |
options | Optional[Dict[str, Any]] | None | Additional model options (temperature, top_p, etc.) |
keep_alive | Optional[Union[float, str]] | None | How long to keep the model loaded (e.g., “5m”, 3600 seconds) |
template | Optional[str] | None | The prompt template to use |
system | Optional[str] | None | System message to use |
raw | Optional[bool] | None | Whether to return raw response without formatting |
stream | bool | True | Whether to stream the response |
retries | int | 0 | Number of retries to attempt before raising a ModelProviderError |
delay_between_retries | int | 1 | Delay between retries, in seconds |
exponential_backoff | bool | False | If True, the delay between retries is doubled each time |