Hugging Face Inference Providers¶

Hugging Face Inference Providers offers a streamlined, unified access to hundreds of machine learning models powered by world-class inference partners.

Python¶

Before getting started, you'll need:

Hugging Face Account: Sign up at huggingface.co
Access Token: Create a token with "Make calls to Inference Providers" permissions at huggingface.co/settings/tokens

Installation¶

Install the required packages:

pipuv

pip install openai huggingface-hub

uv add openai huggingface-hub

Setting Up Authentication¶

Set your Hugging Face token as an environment variable:

export HF_TOKEN="hf_your_token_here"

Or use a .env file:

HF_TOKEN=hf_your_token_here

Usage¶

Chat Completion with LLMs¶

Use the OpenAI SDK for familiar, easy-to-use chat completions:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct:ovhcloud",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful AI assistant."
        },
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
    temperature=0.7,
    max_tokens=100,
)

print(completion.choices[0].message.content)

Using Hugging Face Hub Client¶

Alternatively, use the native Hugging Face Hub client:

import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="ovhcloud",
    api_key=os.environ["HF_TOKEN"],
)

result = client.chat_completion(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "Explain quantum computing in simple terms."
        }
    ],
    temperature=0.7,
    max_tokens=200,
)

print(result.choices[0].message.content)

Javascript / TypeScript¶

Installation¶

npmyarn

npm install @huggingface/inference

yarn add @huggingface/inference

Basic Chat Completion¶

import { InferenceClient } from '@huggingface/inference';

const client = new InferenceClient({
  provider: 'ovhcloud',
  apiKey: process.env.HF_TOKEN,
});

const result = await client.chatCompletion({
  model: 'meta-llama/Llama-3.1-8B-Instruct',
  messages: [
    {
      role: 'user',
      content: 'What is the capital of France?',
    },
  ],
  max_tokens: 100,
});

console.log(result.choices[0].message.content);

Using OpenAI SDK¶

Don't forget to install the OpenAI SDK before executing this program.

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://router.huggingface.co/v1',
  apiKey: process.env.HF_TOKEN,
});

const completion = await client.chat.completions.create({
  model: 'meta-llama/Llama-3.1-8B-Instruct:ovhcloud',
  messages: [
    {
      role: 'user',
      content: 'Explain machine learning in simple terms.',
    },
  ],
  temperature: 0.7,
  max_tokens: 200,
});

console.log(completion.choices[0].message.content);

Provider Selection Strategies¶

Hugging Face offers flexible provider selection:

Automatic Selection (Default)¶

# Uses the first available provider based on your preference order
model="meta-llama/Llama-3.1-8B-Instruct"

Specific Provider¶

# Force OVHcloud provider
model="meta-llama/Llama-3.1-8B-Instruct:ovhcloud"

Performance-Based Selection¶

# Select fastest provider (highest throughput)
model="meta-llama/Llama-3.1-8B-Instruct:fastest"

# Select cheapest provider (lowest cost per token)
model="meta-llama/Llama-3.1-8B-Instruct:cheapest"

Setting Provider Preferences¶

Configure your preferred provider order at huggingface.co/settings/inference-providers.

Pricing and Billing¶

Hugging Face Inference Providers uses a pay-as-you-go model:

Billing: Usage is billed directly to your Hugging Face account
No Setup Costs: No infrastructure or commitment required
Transparent Pricing: View pricing details at huggingface.co/pricing
Cost Control: Monitor usage in your Hugging Face settings

You can also Bring Your Own Key to be billed by OVHcloud and not Hugging Face. You have to add first your OVHcloud AI Endpoints API key in Hugging Face settings.

Going Further¶

Additional Resources¶

Community and Support¶

Hugging Face: Join the community at hf.co/join/discord
OVHcloud: Visit our Discord in the #ai-endpoint channel
Issues: Report problems on GitHub