Find the complete code on Github
You can find the code used on this integration directly on OVHcloud's Github.

Hugging Face Inference Providers¶
Hugging Face Inference Providers offers a streamlined, unified access to hundreds of machine learning models powered by world-class inference partners.
Python¶
Before getting started, you'll need:
- Hugging Face Account: Sign up at huggingface.co
- Access Token: Create a token with "Make calls to Inference Providers" permissions at huggingface.co/settings/tokens
Installation¶
Install the required packages:
Setting Up Authentication¶
Set your Hugging Face token as an environment variable:
Or use a .env file:
Usage¶
Chat Completion with LLMs¶
Use the OpenAI SDK for familiar, easy-to-use chat completions:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://router.huggingface.co/v1",
api_key=os.environ["HF_TOKEN"],
)
completion = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct:ovhcloud",
messages=[
{
"role": "system",
"content": "You are a helpful AI assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
temperature=0.7,
max_tokens=100,
)
print(completion.choices[0].message.content)
Using Hugging Face Hub Client¶
Alternatively, use the native Hugging Face Hub client:
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="ovhcloud",
api_key=os.environ["HF_TOKEN"],
)
result = client.chat_completion(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[
{
"role": "user",
"content": "Explain quantum computing in simple terms."
}
],
temperature=0.7,
max_tokens=200,
)
print(result.choices[0].message.content)
Javascript / TypeScript¶
Installation¶
Basic Chat Completion¶
import { InferenceClient } from '@huggingface/inference';
const client = new InferenceClient({
provider: 'ovhcloud',
apiKey: process.env.HF_TOKEN,
});
const result = await client.chatCompletion({
model: 'meta-llama/Llama-3.1-8B-Instruct',
messages: [
{
role: 'user',
content: 'What is the capital of France?',
},
],
max_tokens: 100,
});
console.log(result.choices[0].message.content);
Using OpenAI SDK¶
Don't forget to install the OpenAI SDK before executing this program.
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://router.huggingface.co/v1',
apiKey: process.env.HF_TOKEN,
});
const completion = await client.chat.completions.create({
model: 'meta-llama/Llama-3.1-8B-Instruct:ovhcloud',
messages: [
{
role: 'user',
content: 'Explain machine learning in simple terms.',
},
],
temperature: 0.7,
max_tokens: 200,
});
console.log(completion.choices[0].message.content);
Provider Selection Strategies¶
Hugging Face offers flexible provider selection:
Automatic Selection (Default)¶
# Uses the first available provider based on your preference order
model="meta-llama/Llama-3.1-8B-Instruct"
Specific Provider¶
Performance-Based Selection¶
# Select fastest provider (highest throughput)
model="meta-llama/Llama-3.1-8B-Instruct:fastest"
# Select cheapest provider (lowest cost per token)
model="meta-llama/Llama-3.1-8B-Instruct:cheapest"
Setting Provider Preferences¶
Configure your preferred provider order at huggingface.co/settings/inference-providers.
Pricing and Billing¶
Hugging Face Inference Providers uses a pay-as-you-go model:
- Billing: Usage is billed directly to your Hugging Face account
- No Setup Costs: No infrastructure or commitment required
- Transparent Pricing: View pricing details at huggingface.co/pricing
- Cost Control: Monitor usage in your Hugging Face settings
You can also Bring Your Own Key to be billed by OVHcloud and not Hugging Face. You have to add first your OVHcloud AI Endpoints API key in Hugging Face settings.
Going Further¶
Additional Resources¶
- Hugging Face Inference Providers Documentation
- OVHcloud AI Endpoints on Hugging Face Inference Providers documentation
- OVHcloud Models on Hugging Face
- OVHcloud AI Endpoints Catalog
- Hugging Face Hub Python Library
Community and Support¶
- Hugging Face: Join the community at hf.co/join/discord
- OVHcloud: Visit our Discord in the #ai-endpoint channel
- Issues: Report problems on GitHub