# LLM Providers AutoSchemaKG's `LLMGenerator` is designed to be backend-agnostic, supporting various LLM providers through a unified interface. This allows you to switch between proprietary APIs (like OpenAI) and open-source models (via vLLM, HuggingFace, or other serving frameworks) with minimal code changes. ## OpenAI-Compatible APIs The primary way to interface with LLMs is through the OpenAI-compatible API standard. This supports: - **OpenAI** (GPT-4, GPT-3.5) - **DeepInfra** (Llama 3, Mixtral, Qwen) - **Together AI** - **vLLM** (running as a server) - **LocalAI** / **Ollama** / **LiteLLM** ### Configuration To use an OpenAI-compatible provider, initialize the `OpenAI` client with the appropriate `base_url` and `api_key`, then pass it to `LLMGenerator`. ```python from openai import OpenAI from atlas_rag.llm_generator import LLMGenerator, GenerationConfig # 1. Configure Generation Parameters gen_config = GenerationConfig( temperature=0.5, max_tokens=4096, top_p=0.9 ) # 2. Initialize Client (Example: DeepInfra) client = OpenAI( base_url="https://api.deepinfra.com/v1/openai", api_key="YOUR_API_KEY" ) # 3. Initialize Generator generator = LLMGenerator( client=client, model_name="meta-llama/Llama-3-70b-chat-hf", max_workers=10, # Number of concurrent requests default_config=gen_config ) ``` ### Using Local vLLM Server You can run a local LLM using vLLM and connect to it as if it were an OpenAI API. 1. **Start vLLM Server:** ```bash python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen2.5-7B-Instruct \ --port 8000 ``` 2. **Connect via Python:** ```python client = OpenAI( base_url="http://localhost:8000/v1", api_key="EMPTY" # vLLM usually doesn't require a key locally ) generator = LLMGenerator( client=client, model_name="Qwen/Qwen2.5-7B-Instruct", default_config=gen_config ) ``` ## Azure OpenAI For Azure OpenAI, use the `AzureOpenAI` client. ```python from openai import AzureOpenAI client = AzureOpenAI( api_key="YOUR_AZURE_API_KEY", api_version="2023-05-15", azure_endpoint="https://your-resource.openai.azure.com" ) generator = LLMGenerator( client=client, model_name="gpt-4", # Deployment name default_config=gen_config ) ``` ## Native Local Models (HuggingFace / vLLM Offline) *Note: Direct support for offline `vLLM` or `HuggingFace` pipelines (without the API server) is supported via specific `LLMGenerator` subclasses or configurations. Ensure you have the necessary packages installed (`vllm`, `transformers`, `torch`).* The `GenerationConfig` class includes specific parameters for these backends: - **vLLM**: `min_p`, `use_beam_search`, `guided_json`, `guided_regex` - **HuggingFace**: `repetition_penalty`, `truncation`, `padding` Example of configuring backend-specific parameters: ```python gen_config = GenerationConfig( temperature=0.7, # vLLM specific min_p=0.05, guided_json=my_json_schema, # HuggingFace specific repetition_penalty=1.1 ) ```