Vector Stores & Embeddings

AutoSchemaKG uses vector embeddings to represent nodes, edges, and text passages in the Knowledge Graph. This enables semantic retrieval and similarity search.

Embedding Models

The framework provides a flexible BaseEmbeddingModel interface that supports various embedding backends, including HuggingFace Sentence Transformers, NVIDIA’s NV-Embed, and OpenAI-compatible APIs.

1. Sentence Transformers (Local)

For standard local embeddings using the sentence-transformers library.

from sentence_transformers import SentenceTransformer
from atlas_rag.vectorstore.embedding_model import SentenceEmbedding

# Initialize the underlying model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Wrap it with SentenceEmbedding
encoder = SentenceEmbedding(model)

# Encode text
embeddings = encoder.encode(["Hello world", "Knowledge Graph"])

2. NVIDIA NV-Embed (Local)

Optimized support for NVIDIA’s embedding models (e.g., NV-Embed-v2). This wrapper handles specific instruction formatting required by these models.

from transformers import AutoModel
from atlas_rag.vectorstore.embedding_model import NvEmbed

# Initialize the model
model = AutoModel.from_pretrained("nvidia/NV-Embed-v2", trust_remote_code=True)

# Wrap it
encoder = NvEmbed(model)

# Encode with query types (automatically adds instructions)
# Supported types: 'passage', 'entity', 'edge', 'fill_in_edge'
embeddings = encoder.encode("What is the capital of France?", query_type="passage")

3. OpenAI-Compatible API

Use any OpenAI-compatible embedding API (OpenAI, DeepInfra, vLLM, etc.).

from openai import OpenAI
from atlas_rag.vectorstore.embedding_model import EmbeddingAPI

# Initialize OpenAI client
client = OpenAI(api_key="...")

# Initialize wrapper
encoder = EmbeddingAPI(client, model_name="text-embedding-3-small")

# Encode
embeddings = encoder.encode(["Text to embed"])

4. Qwen Embedding (API)

Specialized wrapper for Qwen embedding models served via API, handling specific instruction formats.

from atlas_rag.vectorstore.embedding_model import Qwen3Emb

# Initialize OpenAI client
client = OpenAI(api_key="...")

encoder = Qwen3Emb(client, model_name="Qwen/Qwen3-Embedding-0.6B")

Creating Vector Indices

Once you have an embedding model, you can generate embeddings for your Knowledge Graph components (nodes, edges, text) and create FAISS indices for efficient retrieval.

from atlas_rag.vectorstore import create_embeddings_and_index

# This function:
# 1. Reads the CSV files generated by KnowledgeGraphExtractor
# 2. Computes embeddings for nodes, edges, and source texts
# 3. Saves the embeddings back to CSV
# 4. Builds and saves FAISS indices

data = create_embeddings_and_index(
    sentence_encoder=encoder,
    model_name="all-MiniLM-L6-v2",
    working_directory="./output_dir",
    keyword="dataset_name",
    include_concept=True,  # Set to True if concepts were generated
    include_events=False,
    normalize_embeddings=True
)