Find the complete code on Github

You can find the code used on this integration directly on OVHcloud's Github.

LiteLLM

LiteLLM¶

LiteLLM is a powerful open-source library that provides a unified interface to call 100+ Large Language Model (LLM) APIs using the OpenAI format.

All Models Supported

LiteLLM supports ALL OVHcloud AI Endpoints models. Just use the prefix ovhcloud/ before any model name. Visit the catalog to explore available models.

Python SDK¶

Installation¶

Install LiteLLM using pip:

pip install litellm

Or for the full proxy server with all features:

pip install 'litellm[proxy]'

Setting Up Authentication¶

Set your OVHcloud API key as an environment variable:

export OVHCLOUD_API_KEY='your-api-key'

Or use a .env file:

OVHCLOUD_API_KEY=your-api-key

Usage¶

Basic Chat Completion¶

from litellm import completion
import os

os.environ['OVHCLOUD_API_KEY'] = "your-api-key"

response = completion(
    model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "Hello, how are you?",
        }
    ],
    max_tokens=100,
    temperature=0.7,
)

print(response.choices[0].message.content)

Advanced Parameters¶

Control model behavior with various parameters:

from litellm import completion

response = completion(
    model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "Generate three creative business ideas for sustainable fashion."
        }
    ],
    max_tokens=300,
    temperature=0.9,        # Control randomness (0.0-2.0)
    top_p=0.95,             # Nucleus sampling
    stop=["\n\n"],          # Stop sequences
    user="user-123",        # User identifier for tracking
)

print(response.choices[0].message.content)

Function Calling / Tool Use¶

Enable models to call functions for extended capabilities:

from litellm import completion
import json

def get_current_weather(location, unit="celsius"):
    """Simulated function to get the weather"""
    if unit == "celsius":
        return {"location": location, "temperature": "22", "unit": "celsius"}
    else:
        return {"location": location, "temperature": "72", "unit": "fahrenheit"}

# Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and country, e.g. Paris, France"
                    },
                    "unit": {
                        "type": "string", 
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# First call to get the tool usage decision
response = completion(
    model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
    messages=[{"role": "user", "content": "What's the weather like in Paris?"}],
    tools=tools,
    tool_choice="auto"
)

# Process tool calls
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    function_args = json.loads(tool_call.function.arguments)

    # Execute the function
    result = get_current_weather(
        location=function_args.get("location"),
        unit=function_args.get("unit", "celsius")
    )

    print(f"Tool result: {result}")

Vision Models¶

Process images with Vision-Language Models:

from base64 import b64encode
from mimetypes import guess_type
import litellm

def data_url_from_image(file_path):
    """Convert image file to base64 data URL"""
    mime_type, _ = guess_type(file_path)
    if mime_type is None:
        raise ValueError("Could not determine MIME type of the file")

    with open(file_path, "rb") as image_file:
        encoded_string = b64encode(image_file.read()).decode("utf-8")

    return f"data:{mime_type};base64,{encoded_string}"

response = litellm.completion(
    model="ovhcloud/Qwen2-VL-72B-Instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's in this image? Describe it in detail."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": data_url_from_image("your_image.jpg"),
                    }
                }
            ]
        }
    ],
)

print(response.choices[0].message.content)

Structured Output with JSON Schema¶

Generate structured data reliably with JSON Schema:

from litellm import completion

response = completion(
    model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
    messages=[
        {
            "role": "system",
            "content": (
                "You are a specialist in extracting structured data from text. "
                "Format your responses according to the requested schema."
            ),
        },
        {
            "role": "user",
            "content": "Room 12 contains books, a desk, and a lamp."
        },
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "title": "data",
            "name": "data_extraction",
            "schema": {
                "type": "object",
                "properties": {
                    "section": {"type": "string"},
                    "products": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                },
                "required": ["section", "products"],
                "additionalProperties": False
            },
            "strict": False
        }
    },
)

print(response.choices[0].message.content)
# Output: {"section": "Room 12", "products": ["books", "desk", "lamp"]}

Embeddings¶

Create vector embeddings for semantic search and RAG applications:

from litellm import embedding

response = embedding(
    model="ovhcloud/BGE-M3",
    input=["sample text to embed", "another sample text to embed"]
)

# Access embeddings
for idx, embedding_data in enumerate(response.data):
    print(f"Text {idx}: {len(embedding_data['embedding'])} dimensions")
    print(f"First 5 values: {embedding_data['embedding'][:5]}")

Audio Transcription¶

Transcribe audio files with Whisper models:

from litellm import transcription

audio_file = open("path/to/your/audio.wav", "rb")

response = transcription(
    model="ovhcloud/whisper-large-v3-turbo",
    file=audio_file,
    language="en",  # Optional: specify language
)

print(response.text)
audio_file.close()

LiteLLM Proxy Server (LLM Gateway)¶

Deploy LiteLLM as a centralized gateway to manage all LLM access for your organization.

Benefits of the Proxy¶

Virtual Keys: Create and manage API keys for teams/projects
Cost Tracking: Monitor spending per user, team, or project
Rate Limiting: Control usage with custom rate limits
Caching: Reduce costs with intelligent response caching
Load Balancing: Distribute traffic across multiple deployments
Guardrails: Add content moderation and safety checks
Admin Dashboard: Web UI for monitoring and management

Setting Up the Proxy¶

1. Create Configuration File¶

Create a config.yaml file:

model_list:
  - model_name: llama-3.3-70b
    litellm_params:
      model: ovhcloud/Meta-Llama-3_3-70B-Instruct
      api_key: os.environ/OVHCLOUD_API_KEY

  - model_name: gpt-oss
    litellm_params:
      model: ovhcloud/gpt-oss-120b
      api_key: os.environ/OVHCLOUD_API_KEY

  - model_name: mistral-mixtral
    litellm_params:
      model: ovhcloud/Mixtral-8x7B-Instruct-v0_1
      api_key: os.environ/OVHCLOUD_API_KEY

  - model_name: embeddings
    litellm_params:
      model: ovhcloud/BGE-M3
      api_key: os.environ/OVHCLOUD_API_KEY

# Optional: Enable caching
litellm_settings:
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379

# Optional: Set rate limits
general_settings:
  master_key: sk-1234  # Your proxy admin key

2. Start the Proxy Server¶

litellm --config /path/to/config.yaml

# Or with custom port
litellm --config /path/to/config.yaml --port 8000

# Run in detached mode
litellm --config /path/to/config.yaml --detailed_debug

3. Use the Proxy¶

Python (OpenAI SDK):

import openai

client = openai.OpenAI(
    api_key="sk-1234",  # Your proxy master key
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="llama-3.3-70b",  # Use the model_name from config
    messages=[
        {
            "role": "user",
            "content": "What is machine learning?"
        }
    ],
)

print(response.choices[0].message.content)

cURL:

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "llama-3.3-70b",
    "messages": [
        {
            "role": "user",
            "content": "Explain quantum computing"
        }
    ]
}'

Advanced Proxy Features¶

Virtual Keys¶

Create keys for different teams or projects:

# Create a virtual key
curl -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
  "user_id": "team-engineering",
  "max_budget": 100,
  "budget_duration": "30d",
  "models": ["llama-3.3-70b", "gpt-oss"]
}'

Load Balancing¶

Configure multiple deployments for high availability:

model_list:
  - model_name: llama-3.3-70b
    litellm_params:
      model: ovhcloud/Meta-Llama-3_3-70B-Instruct
      api_key: os.environ/OVHCLOUD_API_KEY

  - model_name: llama-3.3-70b  # Same name for load balancing
    litellm_params:
      model: ovhcloud/Meta-Llama-3_3-70B-Instruct
      api_key: os.environ/OVHCLOUD_API_KEY_2

router_settings:
  routing_strategy: simple-shuffle  # or latency-based-routing

Going Further¶

Additional Resources¶

Community and Support¶

LiteLLM: Join the Discord/Slack community
GitHub: Report issues on GitHub
OVHcloud: Visit our Discord in the #ai-endpoint channel