Find the complete code on Github
You can find the code used on this integration directly on OVHcloud's Github.

LiveKit¶
LiveKit is a powerful open-source framework and cloud platform for building realtime voice, video, and physical AI agents. It provides a complete set of tools for creating conversational AI experiences with support for Speech-to-Text (STT), Large Language Models (LLM), Text-to-Speech (TTS), and realtime communication using WebRTC.
Why LiveKit with OVHcloud AI Endpoints?¶
LiveKit's integration with OVHcloud AI Endpoints brings together the best of both worlds:
- High-Performance Models: Access OVHcloud's optimized open-source models for STT and LLM
- Low Latency: LiveKit's WebRTC infrastructure ensures realtime interactions
- Flexible Pipeline: Mix and match STT, LLM, and TTS providers to suit your needs
- Production-Ready: Built-in load balancing, error handling, and observability
Python¶
Prerequisites¶
Before starting this tutorial, make sure you have:
- LiveKit Instance: Either sign up on LiveKit Cloud or run your self-hosted LiveKit instance
- OVHcloud AI Endpoints API Key: Follow our Generate an API key tutorial
Installation¶
Getting started
You can find more information on how to get started with LiveKit on their documentation.
Install the LiveKit Agents framework with the required plugins:
Environment Setup¶
Create a .env file in your project directory to store your credentials:
LIVEKIT_URL=wss://your-livekit-url
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-api-secret
OVHCLOUD_API_KEY=your-ai-endpoints-api-key
Create Your First Voice Agent¶
Create a new Python file (for example, agent.py) and paste this code:
from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentServer, AgentSession, Agent, room_io
from livekit.plugins import silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
from livekit.plugins import openai
load_dotenv(".env")
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="""You are a helpful and friendly AI assistant.
Keep your responses concise and conversational.
Do not use emojis, formatting, or markdown. Just simple paragraphs and short sentences.
""",
)
server = AgentServer()
@server.rtc_session()
async def my_agent(ctx: agents.JobContext):
session = AgentSession(
stt=openai.STT.with_ovhcloud(language="en"),
llm=openai.LLM.with_ovhcloud(model="gpt-oss-120b"),
tts="cartesia/sonic-3:38f41816-0cff-4573-a072-5ce98993ae82",
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
)
await session.start(
agent=Assistant(),
room=ctx.room,
room_input_options=room_io.RoomInputOptions(),
)
await session.generate_reply(
instructions="Greet the user and introduce yourself."
)
if __name__ == "__main__":
agents.cli.run_app(server)
Understanding the Components¶
Speech-to-Text (STT)¶
The default STT model used with OVHcloud AI Endpoints is whisper-large-v3-turbo. You can specify a different model by adding the model parameter:
stt=openai.STT.with_ovhcloud(
language="en",
model="whisper-large-v3-turbo" # or another available model
)
Explore available STT models in the OVHcloud AI Endpoints catalog.
Large Language Model (LLM)¶
OVHcloud AI Endpoints provides access to various open-source language models.
llm=openai.LLM.with_ovhcloud(
model="gpt-oss-120b", # Choose your preferred model
temperature=0.7 # Optional: control randomness (0.0-1.0)
)
Voice Activity Detection (VAD)¶
The VAD module detects when the user is speaking. Silero VAD is a high-quality, lightweight solution:
Turn Detection¶
The turn detection module determines when the user has finished speaking, allowing the agent to respond at the right time:
Text-to-Speech (TTS)¶
For this example, we use Cartesia's Sonic 3 for natural-sounding speech synthesis. You can also clone your own voice on Cartesia's website and use it with LiveKit!
Running Your Agent¶
The LiveKit Agents CLI provides several commands to help you develop and test your agent.
View Available Commands¶
Download Required Files¶
First, download the necessary model files:
Test in Console Mode¶
To test your agent directly in the terminal:
Your agent is now live! You can start speaking, and it will respond using OVHcloud AI Endpoints for speech recognition and language processing.
Development Mode¶
Run your agent in development mode for testing:
Production Mode¶
Deploy your agent with production-ready optimizations:
Advanced Configuration¶
Multilingual Support¶
Change the language parameter to support different languages:
Supported languages include: en, fr, es, de, it, pt, nl, ru, ja, zh, and many more.
Custom Instructions¶
Customize your agent's personality and behavior by modifying the instructions:
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="""You are a medical assistant specializing in patient triage.
Ask relevant questions about symptoms.
Provide empathetic and professional responses.
Always remind users to consult with a healthcare professional.
""",
)
Adding Function Tools¶
Extend your agent's capabilities with custom functions:
from livekit.agents import function_tool
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="You are a helpful assistant with access to weather information.",
)
@function_tool
async def get_weather(location: str) -> str:
"""Get the current weather for a location."""
# In a real application, you would call a weather API
return f"The weather in {location} is sunny and 22°C"
@server.rtc_session()
async def my_agent(ctx: agents.JobContext):
session = AgentSession(
stt=openai.STT.with_ovhcloud(language="en"),
llm=openai.LLM.with_ovhcloud(model="gpt-oss-120b"),
tts="cartesia/sonic-3:38f41816-0cff-4573-a072-5ce98993ae82",
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
)
agent = Assistant()
agent.tools.append(get_weather)
await session.start(
agent=agent,
room=ctx.room,
)
Testing Your Agent¶
Using the Agents Playground¶
LiveKit provides a web-based playground for testing your agents:
- Make sure your agent is running in
devorstartmode - Visit the LiveKit Agents Playground
- Connect to your LiveKit instance
- Start talking to your agent!
Typescript¶
LiveKit Agents also supports TypeScript/Node.js. Here's a quick example:
Installation¶
Basic Usage¶
import { Agent, AgentSession } from '@livekit/agents';
import { STT, LLM } from '@livekit/agents-plugin-openai';
const agent = new Agent({
instructions: 'You are a helpful AI assistant.',
});
const session = new AgentSession({
stt: STT.withOVHcloud({ language: 'en' }),
llm: LLM.withOVHcloud({ model: 'gpt-oss-120b' }),
tts: 'cartesia/sonic-3:38f41816-0cff-4573-a072-5ce98993ae82',
});
await session.start({ agent, room });
For complete TypeScript documentation, visit the LiveKit Agents Node.js reference.
Going Further¶
Additional Resources¶
- LiveKit Agents Documentation - Complete framework documentation
- Voice AI Quickstart - Step-by-step guide to building agents
- LiveKit Agent Builder - No-code agent prototyping
- OVHcloud AI Endpoints Getting Started - Learn about AI Endpoints capabilities
Community and Support¶
- Need help? Join the LiveKit Slack community
- Have questions about OVHcloud AI Endpoints? Visit our Discord in the #ai-endpoint channel
- Found a bug or want to contribute? Open an issue or PR on GitHub