Find the complete code on Github
You can find the code used on this integration directly on OVHcloud's Github.

LiteLLM¶
LiteLLM is a powerful open-source library that provides a unified interface to call 100+ Large Language Model (LLM) APIs using the OpenAI format.
All Models Supported
LiteLLM supports ALL OVHcloud AI Endpoints models. Just use the prefix ovhcloud/ before any model name.
Visit the catalog to explore available models.
Python SDK¶
Installation¶
Install LiteLLM using pip:
Or for the full proxy server with all features:
Setting Up Authentication¶
Set your OVHcloud API key as an environment variable:
Or use a .env file:
Usage¶
Basic Chat Completion¶
from litellm import completion
import os
os.environ['OVHCLOUD_API_KEY'] = "your-api-key"
response = completion(
model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
messages=[
{
"role": "user",
"content": "Hello, how are you?",
}
],
max_tokens=100,
temperature=0.7,
)
print(response.choices[0].message.content)
Advanced Parameters¶
Control model behavior with various parameters:
from litellm import completion
response = completion(
model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
messages=[
{
"role": "user",
"content": "Generate three creative business ideas for sustainable fashion."
}
],
max_tokens=300,
temperature=0.9, # Control randomness (0.0-2.0)
top_p=0.95, # Nucleus sampling
stop=["\n\n"], # Stop sequences
user="user-123", # User identifier for tracking
)
print(response.choices[0].message.content)
Function Calling / Tool Use¶
Enable models to call functions for extended capabilities:
from litellm import completion
import json
def get_current_weather(location, unit="celsius"):
"""Simulated function to get the weather"""
if unit == "celsius":
return {"location": location, "temperature": "22", "unit": "celsius"}
else:
return {"location": location, "temperature": "72", "unit": "fahrenheit"}
# Define available tools
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and country, e.g. Paris, France"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}
]
# First call to get the tool usage decision
response = completion(
model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
messages=[{"role": "user", "content": "What's the weather like in Paris?"}],
tools=tools,
tool_choice="auto"
)
# Process tool calls
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
function_args = json.loads(tool_call.function.arguments)
# Execute the function
result = get_current_weather(
location=function_args.get("location"),
unit=function_args.get("unit", "celsius")
)
print(f"Tool result: {result}")
Vision Models¶
Process images with Vision-Language Models:
from base64 import b64encode
from mimetypes import guess_type
import litellm
def data_url_from_image(file_path):
"""Convert image file to base64 data URL"""
mime_type, _ = guess_type(file_path)
if mime_type is None:
raise ValueError("Could not determine MIME type of the file")
with open(file_path, "rb") as image_file:
encoded_string = b64encode(image_file.read()).decode("utf-8")
return f"data:{mime_type};base64,{encoded_string}"
response = litellm.completion(
model="ovhcloud/Qwen2-VL-72B-Instruct",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image? Describe it in detail."
},
{
"type": "image_url",
"image_url": {
"url": data_url_from_image("your_image.jpg"),
}
}
]
}
],
)
print(response.choices[0].message.content)
Structured Output with JSON Schema¶
Generate structured data reliably with JSON Schema:
from litellm import completion
response = completion(
model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
messages=[
{
"role": "system",
"content": (
"You are a specialist in extracting structured data from text. "
"Format your responses according to the requested schema."
),
},
{
"role": "user",
"content": "Room 12 contains books, a desk, and a lamp."
},
],
response_format={
"type": "json_schema",
"json_schema": {
"title": "data",
"name": "data_extraction",
"schema": {
"type": "object",
"properties": {
"section": {"type": "string"},
"products": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["section", "products"],
"additionalProperties": False
},
"strict": False
}
},
)
print(response.choices[0].message.content)
# Output: {"section": "Room 12", "products": ["books", "desk", "lamp"]}
Embeddings¶
Create vector embeddings for semantic search and RAG applications:
from litellm import embedding
response = embedding(
model="ovhcloud/BGE-M3",
input=["sample text to embed", "another sample text to embed"]
)
# Access embeddings
for idx, embedding_data in enumerate(response.data):
print(f"Text {idx}: {len(embedding_data['embedding'])} dimensions")
print(f"First 5 values: {embedding_data['embedding'][:5]}")
Audio Transcription¶
Transcribe audio files with Whisper models:
from litellm import transcription
audio_file = open("path/to/your/audio.wav", "rb")
response = transcription(
model="ovhcloud/whisper-large-v3-turbo",
file=audio_file,
language="en", # Optional: specify language
)
print(response.text)
audio_file.close()
LiteLLM Proxy Server (LLM Gateway)¶
Deploy LiteLLM as a centralized gateway to manage all LLM access for your organization.
Benefits of the Proxy¶
- Virtual Keys: Create and manage API keys for teams/projects
- Cost Tracking: Monitor spending per user, team, or project
- Rate Limiting: Control usage with custom rate limits
- Caching: Reduce costs with intelligent response caching
- Load Balancing: Distribute traffic across multiple deployments
- Guardrails: Add content moderation and safety checks
- Admin Dashboard: Web UI for monitoring and management
Setting Up the Proxy¶
1. Create Configuration File¶
Create a config.yaml file:
model_list:
- model_name: llama-3.3-70b
litellm_params:
model: ovhcloud/Meta-Llama-3_3-70B-Instruct
api_key: os.environ/OVHCLOUD_API_KEY
- model_name: gpt-oss
litellm_params:
model: ovhcloud/gpt-oss-120b
api_key: os.environ/OVHCLOUD_API_KEY
- model_name: mistral-mixtral
litellm_params:
model: ovhcloud/Mixtral-8x7B-Instruct-v0_1
api_key: os.environ/OVHCLOUD_API_KEY
- model_name: embeddings
litellm_params:
model: ovhcloud/BGE-M3
api_key: os.environ/OVHCLOUD_API_KEY
# Optional: Enable caching
litellm_settings:
cache: true
cache_params:
type: redis
host: localhost
port: 6379
# Optional: Set rate limits
general_settings:
master_key: sk-1234 # Your proxy admin key
2. Start the Proxy Server¶
litellm --config /path/to/config.yaml
# Or with custom port
litellm --config /path/to/config.yaml --port 8000
# Run in detached mode
litellm --config /path/to/config.yaml --detailed_debug
3. Use the Proxy¶
Python (OpenAI SDK):
import openai
client = openai.OpenAI(
api_key="sk-1234", # Your proxy master key
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="llama-3.3-70b", # Use the model_name from config
messages=[
{
"role": "user",
"content": "What is machine learning?"
}
],
)
print(response.choices[0].message.content)
cURL:
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "llama-3.3-70b",
"messages": [
{
"role": "user",
"content": "Explain quantum computing"
}
]
}'
Advanced Proxy Features¶
Virtual Keys¶
Create keys for different teams or projects:
# Create a virtual key
curl -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"user_id": "team-engineering",
"max_budget": 100,
"budget_duration": "30d",
"models": ["llama-3.3-70b", "gpt-oss"]
}'
Load Balancing¶
Configure multiple deployments for high availability:
model_list:
- model_name: llama-3.3-70b
litellm_params:
model: ovhcloud/Meta-Llama-3_3-70B-Instruct
api_key: os.environ/OVHCLOUD_API_KEY
- model_name: llama-3.3-70b # Same name for load balancing
litellm_params:
model: ovhcloud/Meta-Llama-3_3-70B-Instruct
api_key: os.environ/OVHCLOUD_API_KEY_2
router_settings:
routing_strategy: simple-shuffle # or latency-based-routing
Going Further¶
Additional Resources¶
- LiteLLM Documentation
- LiteLLM GitHub Repository
- OVHcloud AI Endpoints Catalog
- OVHcloud AI Endpoints Documentation
- LiteLLM Proxy Documentation
- LiteLLM Router Documentation
Community and Support¶
- LiteLLM: Join the Discord/Slack community
- GitHub: Report issues on GitHub
- OVHcloud: Visit our Discord in the #ai-endpoint channel