# Vector Stores & Embeddings AutoSchemaKG uses vector embeddings to represent nodes, edges, and text passages in the Knowledge Graph. This enables semantic retrieval and similarity search. ## Embedding Models The framework provides a flexible `BaseEmbeddingModel` interface that supports various embedding backends, including HuggingFace Sentence Transformers, NVIDIA's NV-Embed, and OpenAI-compatible APIs. ### 1. Sentence Transformers (Local) For standard local embeddings using the `sentence-transformers` library. ```python from sentence_transformers import SentenceTransformer from atlas_rag.vectorstore.embedding_model import SentenceEmbedding # Initialize the underlying model model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") # Wrap it with SentenceEmbedding encoder = SentenceEmbedding(model) # Encode text embeddings = encoder.encode(["Hello world", "Knowledge Graph"]) ``` ### 2. NVIDIA NV-Embed (Local) Optimized support for NVIDIA's embedding models (e.g., `NV-Embed-v2`). This wrapper handles specific instruction formatting required by these models. ```python from transformers import AutoModel from atlas_rag.vectorstore.embedding_model import NvEmbed # Initialize the model model = AutoModel.from_pretrained("nvidia/NV-Embed-v2", trust_remote_code=True) # Wrap it encoder = NvEmbed(model) # Encode with query types (automatically adds instructions) # Supported types: 'passage', 'entity', 'edge', 'fill_in_edge' embeddings = encoder.encode("What is the capital of France?", query_type="passage") ``` ### 3. OpenAI-Compatible API Use any OpenAI-compatible embedding API (OpenAI, DeepInfra, vLLM, etc.). ```python from openai import OpenAI from atlas_rag.vectorstore.embedding_model import EmbeddingAPI # Initialize OpenAI client client = OpenAI(api_key="...") # Initialize wrapper encoder = EmbeddingAPI(client, model_name="text-embedding-3-small") # Encode embeddings = encoder.encode(["Text to embed"]) ``` ### 4. Qwen Embedding (API) Specialized wrapper for Qwen embedding models served via API, handling specific instruction formats. ```python from atlas_rag.vectorstore.embedding_model import Qwen3Emb # Initialize OpenAI client client = OpenAI(api_key="...") encoder = Qwen3Emb(client, model_name="Qwen/Qwen3-Embedding-0.6B") ``` ## Creating Vector Indices Once you have an embedding model, you can generate embeddings for your Knowledge Graph components (nodes, edges, text) and create FAISS indices for efficient retrieval. ```python from atlas_rag.vectorstore import create_embeddings_and_index # This function: # 1. Reads the CSV files generated by KnowledgeGraphExtractor # 2. Computes embeddings for nodes, edges, and source texts # 3. Saves the embeddings back to CSV # 4. Builds and saves FAISS indices data = create_embeddings_and_index( sentence_encoder=encoder, model_name="all-MiniLM-L6-v2", working_directory="./output_dir", keyword="dataset_name", include_concept=True, # Set to True if concepts were generated include_events=False, normalize_embeddings=True ) ```