LLM Providers
AutoSchemaKG’s LLMGenerator is designed to be backend-agnostic, supporting various LLM providers through a unified interface. This allows you to switch between proprietary APIs (like OpenAI) and open-source models (via vLLM, HuggingFace, or other serving frameworks) with minimal code changes.
OpenAI-Compatible APIs
The primary way to interface with LLMs is through the OpenAI-compatible API standard. This supports:
OpenAI (GPT-4, GPT-3.5)
DeepInfra (Llama 3, Mixtral, Qwen)
Together AI
vLLM (running as a server)
LocalAI / Ollama / LiteLLM
Configuration
To use an OpenAI-compatible provider, initialize the OpenAI client with the appropriate base_url and api_key, then pass it to LLMGenerator.
from openai import OpenAI
from atlas_rag.llm_generator import LLMGenerator, GenerationConfig
# 1. Configure Generation Parameters
gen_config = GenerationConfig(
temperature=0.5,
max_tokens=4096,
top_p=0.9
)
# 2. Initialize Client (Example: DeepInfra)
client = OpenAI(
base_url="https://api.deepinfra.com/v1/openai",
api_key="YOUR_API_KEY"
)
# 3. Initialize Generator
generator = LLMGenerator(
client=client,
model_name="meta-llama/Llama-3-70b-chat-hf",
max_workers=10, # Number of concurrent requests
default_config=gen_config
)
Using Local vLLM Server
You can run a local LLM using vLLM and connect to it as if it were an OpenAI API.
Start vLLM Server:
python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen2.5-7B-Instruct \ --port 8000
Connect via Python:
client = OpenAI( base_url="http://localhost:8000/v1", api_key="EMPTY" # vLLM usually doesn't require a key locally ) generator = LLMGenerator( client=client, model_name="Qwen/Qwen2.5-7B-Instruct", default_config=gen_config )
Azure OpenAI
For Azure OpenAI, use the AzureOpenAI client.
from openai import AzureOpenAI
client = AzureOpenAI(
api_key="YOUR_AZURE_API_KEY",
api_version="2023-05-15",
azure_endpoint="https://your-resource.openai.azure.com"
)
generator = LLMGenerator(
client=client,
model_name="gpt-4", # Deployment name
default_config=gen_config
)
Native Local Models (HuggingFace / vLLM Offline)
Note: Direct support for offline vLLM or HuggingFace pipelines (without the API server) is supported via specific LLMGenerator subclasses or configurations. Ensure you have the necessary packages installed (vllm, transformers, torch).
The GenerationConfig class includes specific parameters for these backends:
vLLM:
min_p,use_beam_search,guided_json,guided_regexHuggingFace:
repetition_penalty,truncation,padding
Example of configuring backend-specific parameters:
gen_config = GenerationConfig(
temperature=0.7,
# vLLM specific
min_p=0.05,
guided_json=my_json_schema,
# HuggingFace specific
repetition_penalty=1.1
)