Find the complete code on Github

You can find the code used on this integration directly on OVHcloud's Github.

LiveKit

LiveKit¶

LiveKit is a powerful open-source framework and cloud platform for building realtime voice, video, and physical AI agents. It provides a complete set of tools for creating conversational AI experiences with support for Speech-to-Text (STT), Large Language Models (LLM), Text-to-Speech (TTS), and realtime communication using WebRTC.

Why LiveKit with OVHcloud AI Endpoints?¶

LiveKit's integration with OVHcloud AI Endpoints brings together the best of both worlds:

High-Performance Models: Access OVHcloud's optimized open-source models for STT and LLM
Low Latency: LiveKit's WebRTC infrastructure ensures realtime interactions
Flexible Pipeline: Mix and match STT, LLM, and TTS providers to suit your needs
Production-Ready: Built-in load balancing, error handling, and observability

Python¶

Prerequisites¶

Before starting this tutorial, make sure you have:

LiveKit Instance: Either sign up on LiveKit Cloud or run your self-hosted LiveKit instance
OVHcloud AI Endpoints API Key: Follow our Generate an API key tutorial

Installation¶

Getting started

You can find more information on how to get started with LiveKit on their documentation.

Install the LiveKit Agents framework with the required plugins:

pipuv

pip install \
"livekit-agents[silero,turn-detector,openai]~=1.2" \
"livekit-plugins-noise-cancellation~=0.2" \
"python-dotenv"

uv add \
"livekit-agents[silero,turn-detector,openai]~=1.2" \
"livekit-plugins-noise-cancellation~=0.2" \
"python-dotenv"

Environment Setup¶

Create a .env file in your project directory to store your credentials:

LIVEKIT_URL=wss://your-livekit-url
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-api-secret
OVHCLOUD_API_KEY=your-ai-endpoints-api-key

Create Your First Voice Agent¶

Create a new Python file (for example, agent.py) and paste this code:

from dotenv import load_dotenv

from livekit import agents
from livekit.agents import AgentServer, AgentSession, Agent, room_io
from livekit.plugins import silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
from livekit.plugins import openai

load_dotenv(".env")

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful and friendly AI assistant. 
            Keep your responses concise and conversational.
            Do not use emojis, formatting, or markdown. Just simple paragraphs and short sentences.
            """,
        )

server = AgentServer()

@server.rtc_session()
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        stt=openai.STT.with_ovhcloud(language="en"),
        llm=openai.LLM.with_ovhcloud(model="gpt-oss-120b"),
        tts="cartesia/sonic-3:38f41816-0cff-4573-a072-5ce98993ae82",
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    await session.start(
        agent=Assistant(),
        room=ctx.room,
        room_input_options=room_io.RoomInputOptions(),
    )

    await session.generate_reply(
        instructions="Greet the user and introduce yourself."
    )


if __name__ == "__main__":
    agents.cli.run_app(server)

Understanding the Components¶

Speech-to-Text (STT)¶

The default STT model used with OVHcloud AI Endpoints is whisper-large-v3-turbo. You can specify a different model by adding the model parameter:

stt=openai.STT.with_ovhcloud(
    language="en",
    model="whisper-large-v3-turbo"  # or another available model
)

Explore available STT models in the OVHcloud AI Endpoints catalog.

Large Language Model (LLM)¶

OVHcloud AI Endpoints provides access to various open-source language models.

llm=openai.LLM.with_ovhcloud(
    model="gpt-oss-120b",  # Choose your preferred model
    temperature=0.7        # Optional: control randomness (0.0-1.0)
)

Voice Activity Detection (VAD)¶

The VAD module detects when the user is speaking. Silero VAD is a high-quality, lightweight solution:

vad=silero.VAD.load()

Turn Detection¶

The turn detection module determines when the user has finished speaking, allowing the agent to respond at the right time:

turn_detection=MultilingualModel()

Text-to-Speech (TTS)¶

For this example, we use Cartesia's Sonic 3 for natural-sounding speech synthesis. You can also clone your own voice on Cartesia's website and use it with LiveKit!

tts="cartesia/sonic-3:38f41816-0cff-4573-a072-5ce98993ae82"

Running Your Agent¶

The LiveKit Agents CLI provides several commands to help you develop and test your agent.

View Available Commands¶

python agent.py --help

Download Required Files¶

First, download the necessary model files:

python agent.py download-files

Test in Console Mode¶

To test your agent directly in the terminal:

python agent.py console

Your agent is now live! You can start speaking, and it will respond using OVHcloud AI Endpoints for speech recognition and language processing.

Development Mode¶

Run your agent in development mode for testing:

python agent.py dev

Production Mode¶

Deploy your agent with production-ready optimizations:

python agent.py start

Advanced Configuration¶

Multilingual Support¶

Change the language parameter to support different languages:

stt=openai.STT.with_ovhcloud(
    language="fr",  # French
    model="whisper-large-v3-turbo"
)

Supported languages include: en, fr, es, de, it, pt, nl, ru, ja, zh, and many more.

Custom Instructions¶

Customize your agent's personality and behavior by modifying the instructions:

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a medical assistant specializing in patient triage.
            Ask relevant questions about symptoms.
            Provide empathetic and professional responses.
            Always remind users to consult with a healthcare professional.
            """,
        )

Adding Function Tools¶

Extend your agent's capabilities with custom functions:

from livekit.agents import function_tool

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful assistant with access to weather information.",
        )

@function_tool
async def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    # In a real application, you would call a weather API
    return f"The weather in {location} is sunny and 22°C"

@server.rtc_session()
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        stt=openai.STT.with_ovhcloud(language="en"),
        llm=openai.LLM.with_ovhcloud(model="gpt-oss-120b"),
        tts="cartesia/sonic-3:38f41816-0cff-4573-a072-5ce98993ae82",
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    agent = Assistant()
    agent.tools.append(get_weather)

    await session.start(
        agent=agent,
        room=ctx.room,
    )

Testing Your Agent¶

Using the Agents Playground¶

LiveKit provides a web-based playground for testing your agents:

Make sure your agent is running in dev or start mode
Visit the LiveKit Agents Playground
Connect to your LiveKit instance
Start talking to your agent!

Typescript¶

LiveKit Agents also supports TypeScript/Node.js. Here's a quick example:

Installation¶

npm install @livekit/agents @livekit/agents-plugin-openai

Basic Usage¶

import { Agent, AgentSession } from '@livekit/agents';
import { STT, LLM } from '@livekit/agents-plugin-openai';

const agent = new Agent({
  instructions: 'You are a helpful AI assistant.',
});

const session = new AgentSession({
  stt: STT.withOVHcloud({ language: 'en' }),
  llm: LLM.withOVHcloud({ model: 'gpt-oss-120b' }),
  tts: 'cartesia/sonic-3:38f41816-0cff-4573-a072-5ce98993ae82',
});

await session.start({ agent, room });

For complete TypeScript documentation, visit the LiveKit Agents Node.js reference.

Going Further¶

Additional Resources¶

LiveKit Agents Documentation - Complete framework documentation
Voice AI Quickstart - Step-by-step guide to building agents
LiveKit Agent Builder - No-code agent prototyping
OVHcloud AI Endpoints Getting Started - Learn about AI Endpoints capabilities

Community and Support¶

Need help? Join the LiveKit Slack community
Have questions about OVHcloud AI Endpoints? Visit our Discord in the #ai-endpoint channel
Found a bug or want to contribute? Open an issue or PR on GitHub