Skip to content

Find the complete code on Github

You can find the code used on this integration directly on OVHcloud's Github.

LiveKit

LiveKit

LiveKit is a powerful open-source framework and cloud platform for building realtime voice, video, and physical AI agents. It provides a complete set of tools for creating conversational AI experiences with support for Speech-to-Text (STT), Large Language Models (LLM), Text-to-Speech (TTS), and realtime communication using WebRTC.

Why LiveKit with OVHcloud AI Endpoints?

LiveKit's integration with OVHcloud AI Endpoints brings together the best of both worlds:

  • High-Performance Models: Access OVHcloud's optimized open-source models for STT and LLM
  • Low Latency: LiveKit's WebRTC infrastructure ensures realtime interactions
  • Flexible Pipeline: Mix and match STT, LLM, and TTS providers to suit your needs
  • Production-Ready: Built-in load balancing, error handling, and observability

Python

Prerequisites

Before starting this tutorial, make sure you have:

  1. LiveKit Instance: Either sign up on LiveKit Cloud or run your self-hosted LiveKit instance
  2. OVHcloud AI Endpoints API Key: Follow our Generate an API key tutorial

Installation

Getting started

You can find more information on how to get started with LiveKit on their documentation.

Install the LiveKit Agents framework with the required plugins:

pip install \
"livekit-agents[silero,turn-detector,openai]~=1.2" \
"livekit-plugins-noise-cancellation~=0.2" \
"python-dotenv"
uv add \
"livekit-agents[silero,turn-detector,openai]~=1.2" \
"livekit-plugins-noise-cancellation~=0.2" \
"python-dotenv"

Environment Setup

Create a .env file in your project directory to store your credentials:

LIVEKIT_URL=wss://your-livekit-url
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-api-secret
OVHCLOUD_API_KEY=your-ai-endpoints-api-key

Create Your First Voice Agent

Create a new Python file (for example, agent.py) and paste this code:

from dotenv import load_dotenv

from livekit import agents
from livekit.agents import AgentServer, AgentSession, Agent, room_io
from livekit.plugins import silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
from livekit.plugins import openai

load_dotenv(".env")

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful and friendly AI assistant. 
            Keep your responses concise and conversational.
            Do not use emojis, formatting, or markdown. Just simple paragraphs and short sentences.
            """,
        )

server = AgentServer()

@server.rtc_session()
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        stt=openai.STT.with_ovhcloud(language="en"),
        llm=openai.LLM.with_ovhcloud(model="gpt-oss-120b"),
        tts="cartesia/sonic-3:38f41816-0cff-4573-a072-5ce98993ae82",
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    await session.start(
        agent=Assistant(),
        room=ctx.room,
        room_input_options=room_io.RoomInputOptions(),
    )

    await session.generate_reply(
        instructions="Greet the user and introduce yourself."
    )


if __name__ == "__main__":
    agents.cli.run_app(server)

Understanding the Components

Speech-to-Text (STT)

The default STT model used with OVHcloud AI Endpoints is whisper-large-v3-turbo. You can specify a different model by adding the model parameter:

stt=openai.STT.with_ovhcloud(
    language="en",
    model="whisper-large-v3-turbo"  # or another available model
)

Explore available STT models in the OVHcloud AI Endpoints catalog.

Large Language Model (LLM)

OVHcloud AI Endpoints provides access to various open-source language models.

llm=openai.LLM.with_ovhcloud(
    model="gpt-oss-120b",  # Choose your preferred model
    temperature=0.7        # Optional: control randomness (0.0-1.0)
)

Voice Activity Detection (VAD)

The VAD module detects when the user is speaking. Silero VAD is a high-quality, lightweight solution:

vad=silero.VAD.load()

Turn Detection

The turn detection module determines when the user has finished speaking, allowing the agent to respond at the right time:

turn_detection=MultilingualModel()

Text-to-Speech (TTS)

For this example, we use Cartesia's Sonic 3 for natural-sounding speech synthesis. You can also clone your own voice on Cartesia's website and use it with LiveKit!

tts="cartesia/sonic-3:38f41816-0cff-4573-a072-5ce98993ae82"

Running Your Agent

The LiveKit Agents CLI provides several commands to help you develop and test your agent.

View Available Commands

python agent.py --help

Download Required Files

First, download the necessary model files:

python agent.py download-files

Test in Console Mode

To test your agent directly in the terminal:

python agent.py console

Your agent is now live! You can start speaking, and it will respond using OVHcloud AI Endpoints for speech recognition and language processing.

Development Mode

Run your agent in development mode for testing:

python agent.py dev

Production Mode

Deploy your agent with production-ready optimizations:

python agent.py start

Advanced Configuration

Multilingual Support

Change the language parameter to support different languages:

stt=openai.STT.with_ovhcloud(
    language="fr",  # French
    model="whisper-large-v3-turbo"
)

Supported languages include: en, fr, es, de, it, pt, nl, ru, ja, zh, and many more.

Custom Instructions

Customize your agent's personality and behavior by modifying the instructions:

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a medical assistant specializing in patient triage.
            Ask relevant questions about symptoms.
            Provide empathetic and professional responses.
            Always remind users to consult with a healthcare professional.
            """,
        )

Adding Function Tools

Extend your agent's capabilities with custom functions:

from livekit.agents import function_tool

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful assistant with access to weather information.",
        )

@function_tool
async def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    # In a real application, you would call a weather API
    return f"The weather in {location} is sunny and 22°C"

@server.rtc_session()
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        stt=openai.STT.with_ovhcloud(language="en"),
        llm=openai.LLM.with_ovhcloud(model="gpt-oss-120b"),
        tts="cartesia/sonic-3:38f41816-0cff-4573-a072-5ce98993ae82",
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    agent = Assistant()
    agent.tools.append(get_weather)

    await session.start(
        agent=agent,
        room=ctx.room,
    )

Testing Your Agent

Using the Agents Playground

LiveKit provides a web-based playground for testing your agents:

  1. Make sure your agent is running in dev or start mode
  2. Visit the LiveKit Agents Playground
  3. Connect to your LiveKit instance
  4. Start talking to your agent!

Typescript

LiveKit Agents also supports TypeScript/Node.js. Here's a quick example:

Installation

npm install @livekit/agents @livekit/agents-plugin-openai

Basic Usage

import { Agent, AgentSession } from '@livekit/agents';
import { STT, LLM } from '@livekit/agents-plugin-openai';

const agent = new Agent({
  instructions: 'You are a helpful AI assistant.',
});

const session = new AgentSession({
  stt: STT.withOVHcloud({ language: 'en' }),
  llm: LLM.withOVHcloud({ model: 'gpt-oss-120b' }),
  tts: 'cartesia/sonic-3:38f41816-0cff-4573-a072-5ce98993ae82',
});

await session.start({ agent, room });

For complete TypeScript documentation, visit the LiveKit Agents Node.js reference.

Going Further

Additional Resources

Community and Support

  • Need help? Join the LiveKit Slack community
  • Have questions about OVHcloud AI Endpoints? Visit our Discord in the #ai-endpoint channel
  • Found a bug or want to contribute? Open an issue or PR on GitHub